Enhancers and cis-regulation

Enhancers control gene expression and play crucial roles in development and homeostasis. However, the targeted de novo design of enhancers with tissue-specific activities has remained challenging. This study combined deep learning and transfer learning to design tissue-specific enhancers for five tissues in the Drosophila melanogaster embryo - the central nervous system (CNS), epidermis, gut, muscle, and brain. First convolutional neural networks (CNNs) were trained using genome-wide scATAC-seq datasets and then the CNNs were fine-tune with smaller-scale data from in vivo enhancer activity assays, yielding models with 25% to 75% positive predictive value according to cross-validation. We 40 synthetic enhancers (eight per tissue) were designed and experimentally assessed in vivo, of which 31 (78%) were active and 27 (68%) functioned in the target tissue (100% for CNS and muscle). The strategy to combine genome-wide and small-scale functional datasets by transfer learning is generally applicable and should enable the design of tissue-, cell type-, and cell state-specific enhancers in any system (de Almeida, 2023).

Enhancers are non-coding DNA elements that activate transcription from target promoters in a highly cell type-specific fashion. Although the existence of enhancer activities within DNA sequences has been recognized since the early 1980s, and hundreds of enhancers have been functionally characterized in model organisms such as flies and mice, the precise encoding of regulatory activities within the DNA sequence has remained elusive. Specifically, while enhancer sequences contain binding sites for transcription factors (TFs), the specific arrangement of these sites and the potential importance of additional sequence properties have remained unknown, hampering the prediction and the de novo design of enhancers with tissue-specific activities (de Almeida, 2023).

Recently, it was demonstrated that by utilizing genome-wide enhancer activity datasets in a model cell line in culture (Pagani, 2022), it is possible to train deep-learning convolutional neural networks (CNNs) to predict enhancer activity and strength directly from the DNA sequence and to design synthetic enhancers de novo. However, extending this achievement to in vivo systems has been challenging, presumably due to the limited number of functionally characterized enhancers, which has remained relatively low, typically falling below a few hundred per tissue in flies and mice. Such quantities have been considered insufficient for effectively training deep learning models. A widely applicable approach to enhance prediction performance with limited data is through the utilization of transfer learning, which has been successfully employed in various fields, including cell biology, network biology, and genomics. Transfer learning involves pre-training models using large-scale datasets that share similarities with the target task, followed by target task-specific adjustment or fine-tuning on smaller datasets. Provided pre-training with datasets sufficiently similar to the target task, transfer learning yields improved prediction performance (de Almeida, 2023).

To predict enhancer activity from the DNA sequence, leveraging genome-wide datasets of enhancer-associated chromatin features as a steppingstone seems particularly promising. Single-cell ATAC-seq (scATAC-seq) datasets provide measurements of DNA accessibility at the single-cell level and thus allow the determination of cell type-specific accessibility profiles even within complex tissues comprising diverse cell populations. Given the association of enhancers with accessible chromatin, it was decided to use a combination of scATAC-seq datasets and results from in vivo enhancer activity assays to develop a deep-learning model predictive of enhancer activity using transfer learning (de Almeida, 2023).

Specifically, four prominent and distinct tissues within the 10-to-12-hours old Drosophila melanogaster embryo were chosen, namely the central nervous system (CNS), epidermis, muscle, and gut. In addition, enhancers that were specifically active in the brain were selected but not in the rest of the CNS, an enhancer-activity pattern that is considered particularly challenging given shared cell types with the CNS and the relatively small number of functionally characterized brain- specific enhancers available for training (de Almeida, 2023).

We first trained single-task CNNs to map 1-kb-long DNA sequences tiled across the genome to the corresponding pseudo-bulk ATAC-seq signals based recently published scATAC-seq atlas of the Drosophila embryo. A tenfold chromosome hold-out cross-validation scheme was used to train and evaluate the predictive performance of the model. As expected based on previous work, these models performed well with Pearson correlation coefficients (PCCs) between the predicted and experimentally measured ATAC-seq signals of approximately 0.73 for all tissues in all held-out test set chromosomes (range of PCCs: 0.72-0.75). Moreover, using model-interpretation tools revealed known TF motifs such as GGGGT (Kr, Ttk) for CNS, Grh for epidermis, GATA for gut, Mef2, Fork head (Bin) and Twist for muscle, and Zelda and Klu for brain. Finally, the models also captured cell type-specific accessibility differences, that is, sites that were preferentially accessible in specific tissues were also predicted to be accessible in these tissues (de Almeida, 2023).

Gunctionally characterized enhancers from previous work were used for transfer learning to build sequence-to-activity models. The enhancer-activity prediction task was framed as a binary classification (active/inactive) as the in vivo enhancer activity data is derived from annotated non-quantitative in situ hybridization assays. CNNs were initialized to predict tissue-specific enhancer activities directly from the DNA sequence by the sequence-to-accessibility models trained on ATAC-seq data for the respective tissues (CNS, epidermis, gut, muscle, and brain), and trained an enhancer prediction task until convergence. The models were evalueated using cross-validation with left-out datasets containing active and inactive enhancers, with and without ATAC-seq signals. This revealed that the sequence-to-activity models obtained by transfer learning substantially improved the predictions for all five tissues as assessed by several performance measures compared to (1) models directlytrained on the in vivo enhancer activity data starting from random initialization, (2) models pre-trained on ATAC-seq data from a different tissue (salivary gland), and (3) the sequence-to- accessibility models without transfer learning. The transfer- learned models also outperformed the other models in correctly discriminating accessible regions with and without enhancer activity, and the improvement was particularly strong for muscle and brain which had the fewest known enhancers for training (177 and 119, respectively). The models also reliably discriminated additional positive and negative control enhancers, including the known enhancers in tissue-specific marker gene loci (de Almeida, 2023).

Moreover, and particularly relevant for enhancer design that can only test a very limited number of predictions in vivo, these models reached positive predictive values (PPVs) between 36% (brain) to 88% (CNS) at prediction thresholds that recovered at least ten known enhancers during cross-validation (or PPVs between 13% to 76% at >=50 known enhancers), suggesting that it was not unreasonable to attempt the de novo design of synthetic enhancers for these tissues. This study therefore proceeded to design synthetic enhancers with defined tissue-specific activities de novo. Specifically, random sequences with a 0-order Markov Model, and eight enhancers were selected for each of the five tissues (40 enhancers total) that had high predicted accessibility and activity scores specifically in the CNS, epidermis, gut, muscle or brain, respectively, focusing on distinct motif signatures when possible to remove potential redundancies (de Almeida, 2023).

The designed enhancer sequences were ordered as gBlocks, cloned into a previously used reporter system that features a minimal hsp70 promoter and lacZ reporter gene, and integrated the constructs into a consistent landing site in the Drosophila genome. Embryos were then collected and fixed and the candidates' enhancer activities were scored by two-color fluorescent in situ hybridization, comparing lacZ reporter expression to the expression of the tissue-specific marker genes elav (CNS), wg (epidermis), GATAe (gut), Mef2 (muscle) and tll (brain). In addition to a qualitative visual assessment, the expression patterns were quantitatively compared by pixel-wise PCCs across the entire volumes of the acquired microscopy image Z-stacks (de Almeida, 2023).

This revealed that eight out of eight CNS enhancers were active in the CNS, some with additional, mainly weak and sporadic, activity in the peripheral nervous system (PNS). Similarly, seven out of eight epidermis enhancers and eight out of eight muscle enhancers functioned specifically in the epidermis and muscle, respectively. Of both the gut and brain enhancers, two out of eight were active each (25%) in the target tissue and partly had additional activities in other tissues such as the CNS, salivary gland or amnioserosa, in line with the expectations from cross-validation. These results from the qualitative visual assessment were confirmed by the quantitative assessment of pattern similarities. In fact, all patterns deemed correct by the visual assessment and even three of the four gut enhancer patterns that were deemed incorrect were significantly different from random and negative control patterns (t-test p-value < 0.05; N=4 embryos) (de Almeida, 2023).

Interestingly, given this study's aim to target broad tissue types that comprise distinct subtypes, not all the enhancers active in the correct target tissue had identical activity patterns. For example, the epidermis enhancers were active in segmental and/or pharyngeal parts of the epidermis, and a similar sub-pattern variability within the correct overall tissue type was seen for CNS and muscle. Similarly interesting are the different success rates for muscle (100%) and gut (25%) and the observation that several gut enhancers were active outside the gut in epidermis,msensory complexes, and amnioserosa. This likely stems from a more complex gut 'enhancer grammar'' involving low-information GATA motifs : the fly's five GATA TFs are employed rather broadly in endoderm and gut (Serpent & dGATAe) but also in amnioserosa, dorsal epidermis, the heart (Pannier), and other tissues 38, i.e. the very tissues for which ectopic gut enhancer activity was observed. In this context, it is interesting that the pattern similarity (PCC) with the gut marker gene dGATAe is significantly above random for all but one of the gut enhancers deemed incorrect by visual assessment (and all the correct ones), potentially indicating pattern overlap and/or relatedness of the tissues. After this proof-of-concept at the level of broad tissue types, it will be exciting to see the development of more fine-grained models that discriminate between closely related tissue subtypes and individual cell types, especially those that share prominent TFs (e.g. GATA motifs in gut and other tissues) (de Almeida, 2023).

Overall, this work demonstrates the feasibility of targeted design of synthetic enhancers for selected tissues by deep and transfer learning. The framework proposed in this study should be applicable to any species and tissue provided a genome-wide dataset of enhancer-associated features (e.g. DNA accessibility, characteristic histone modifications, TF or cofactor binding, eRNAs, etc.) and a reasonable number of functionally validated enhancers (> 100 in this study). More traditional machine-learning approaches have been used successfully for the prediction of chromatin features, TF binding and enhancer sequences and for predicting genomic elementsmwith highly constrained cis-regulatory codes and limited architectures (e.g. core-promoter elements or highly defined enhancer-motif contexts). However, the challenge of flexible menhancer design has only become possible with deep learning (de Almeida, 2023).

For the near future, great progress is foreseen in deep- and transfer learning approaches for the prediction and design of enhancers and other genomic regulatory elements. These will likely include the application of large multi-task models trained simultaneously on many datasets comprising different tissues and cell types. As predictive sequence features such as TF motifs are often shared between tissues, shared learning of large models might further improve model performance compared to the dedicated single-task models used in this study. On the other hand, improved performance might come from the combination of many small, dedicated models such as the ones developed in this study, each specialized for one specific type of function or genomic element, into a larger overarching framework. Another likely improvement for the specific task of enhancer design will be the move from the computational screening of random sequences, which can only sample a very small part of the possible sequence space, to a more direct and efficient way to generate synthetic enhancer sequences, such as the use of generative adversarial networks (GAN), variational autoencoders (VAE) and diffusion models that can 'hallucinate' possible solutions (de Almeida, 2023).

This work complements approaches to design enhancers in or via cell culture models or via the modeling of cell type-characteristic DNA-accessibility patterns and their sequence signatures (topic modeling) and ongoing efforts to predict gene expression and 3D genome architecture from extended DNA sequences. Models to predict endogenous gene expression need to integrate the regulatory cues of multiple enhancers acting from different distances, consider distinct promoter types with enhancer-promoter compatibilities, and insulator-, silencer-, and tethering elements, together with the sequence-determinants of RNA processing and stability. It will be exciting to see these models integrate lessons from enhancer-centric approaches to further develop and move towards designing entire synthetic gene loci with complex gene expression patterns (de Almeida, 2023).

It is envisioned that this work will synergize with ongoing efforts to build comprehensive 'cell atlases' for gene expression and DNA accessibility in the fly, mouse, and human, thus proving the opportunity to design enhancers for many if not all tissues in these organisms, potentially even for aberrant tissue or cell states. In conclusion, this work not only demonstrates the remarkable progress in enhancer design made possible by deep and transfer learning and the growing datasets on enhancers and chromatin, but also sets the stage for a future where the precise design and manipulation of gene expression patterns become a reality (de Almeida, 2023).

Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique

Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers. This study used machine learning models to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction (Nowling, 2023).

A single DPE core promoter motif contributes to in vivo transcriptional regulation and affects cardiac function

Transcription is initiated at the core promoter, which confers specific functions depending on the unique combination of core promoter elements. The downstream core promoter element (DPE) is found in many genes related to heart and mesodermal development. However, the function of these core promoter elements has thus far been studied primarily in isolated, in vitro or reporter gene settings. tinman (tin) encodes a key transcription factor that regulates the formation of the dorsal musculature and heart. Pioneering a novel approach utilizing both CRISPR and nascent transcriptomics, this study showed that a substitution mutation of the functional tin DPE motif within the natural context of the core promoter results in a massive perturbation of Tinman's regulatory network orchestrating dorsal musculature and heart formation. Mutation of endogenous tin DPE reduced the expression of tin and distinct target genes, resulting in significantly reduced viability and an overall decrease in adult heart function. This study has demonstrated the feasibility and importance of characterizing DNA sequence elements in vivo in their natural context, and accentuate the critical impact a single DPE motif has during Drosophila embryogenesis and functional heart formation (Sloutskin, 2023).

Enhancer architecture and chromatin accessibility constrain phenotypic space during Drosophila development

Developmental enhancers bind transcription factors and dictate patterns of gene expression during development. Their molecular evolution can underlie phenotypical evolution, but the contributions of the evolutionary pathways involved remain little understood. Using mutation libraries in Drosophila melanogaster embryos, this study observed that most point mutations in developmental enhancers led to changes in gene expression levels but rarely resulted in novel expression outside of the native pattern. In contrast, random sequences, often acting as developmental enhancers, drove expression across a range of cell types; random sequences including motifs for transcription factors with pioneer activity acted as enhancers even more frequently. These findings suggest that the phenotypic landscapes of developmental enhancers are constrained by enhancer architecture and chromatin accessibility. It is proposed that the evolution of existing enhancers is limited in its capacity to generate novel phenotypes, whereas the activity of de novo elements is a primary source of phenotypic novelty (Galupa, 2023).

This study used transgenesis-based mutagenesis and de novo gene synthesis during fly embryogenesis to investigate evolutionary pathways for enhancer activity. Fly development was used to explore how novel patterns of gene expression might appear from either molecular evolution of developmental enhancers or random sequences. Notably, while reporter gene assays and minimal enhancers may not reflect the full regulatory activities of native loci, such an approach allows evaluation of a broad range of 'possible' enhancer variation in a controlled experimental setup, without associated fitness costs and allowing a broader exploration of evolution and development without the complexities and historical contingencies found in nature. Furthermore, using such an assay in a developmental model system, which generates an embryo in 24 h, regulatory activities can be assayed across ~100,000 cells of different lineage origins (Galupa, 2023).

Using this approach, it was found that most mutations in enhancers led to changes in levels of reporter gene expression, but almost entirely within their native zones of expression, similar to previous studies using transgenic mutagenesis of the Shh enhancer in murine embryos, or the E3N enhancer and the wing spot¹⁹⁶ enhancer in fly embryos. Consistent with current results, known phenotypic evolution through nucleotide mutations of standing regulatory elements seems to appear either through changes in the levels or timings of expression within native zones or the loss of regulatory activities. For example, the evolution of pigmentation spots in fly wings occurred via a specific spatial increase in the melanic protein Yellow, which is uniformly expressed at low levels throughout the developing wings of fruit flies. Evolution of other traits such as thoracic ribs in vertebrates, limbs in snakes, pelvic structures in sticklebacks, and seed shattering in rice are all associated with loss of enhancer activity due to internal enhancer mutations. Additionally, mutations have been found to occur less often in functionally constrained regions of the genome, suggesting that mutation bias may reduce the occurrence of deleterious mutations in regulatory regions (Galupa, 2023).

Consistent with these results, phenotypic novelties underlain by enhancer-associated ectopic gains of expression are reportedly due to transposon mobilization, rearrangements in chromosome topology, or de novo evolution of enhancers from DNA sequences with unrelated or nonregulatory activities (Galupa, 2023).

Previous studies have explored the potential of random DNA sequences to lead to reporter gene expression, either as enhancers or promoters, especially in cell lines of prokaryotic or eukaryotic origin. These have shown that there is a short (or sometimes null) mutational distance between random sequences and active cis-regulatory elements, which may improve evolvability. This study tested random sequences in a developmental context and found that most showed enhancer activity across several types of tissues and developmental stages. These results are consistent with a study that tested enhancer activity of all 6-mers in developing zebrafish embryos and found a diverse range of expression for ~38% of the sequences at two developmental stages. We observed expression driven by random sequences even in the absence of motifs within their sequence for TFs with pioneering activity. Yet, when such motifs were included, nearly all sequences acted as 'strong' enhancers (leading to high levels of expression), consistent with the 'evolutionary barrier' to the formation of a novel enhancer being lower in regions that already contain motifs for DNA-binding factors, which can 'act cooperatively with newly emerging sites (Galupa, 2023).

It is interesting to note that, despite the high potential of random sequences to be expressed during development and across cell types, expression prior to gastrulation was never observed; this was not evaluated in the zebrafish study or in other studies. This may be due to the rapid rates of early fruit fly development, in which gene expression patterns are highly dynamic, and cell-fate specifications occur within minutes. As such, there may be extensive regulatory demands placed on transcriptional enhancers, reflected in the clusters of high-affinity binding sites common across early embryonic developmental enhancers as well as their extensive conservation in function and location (Galupa, 2023).

In the future, it will be interesting to explore how regulatory demands that change across development-such as nuclear differentiation, network cross-talk, and metabolic changes- are reflected in regulatory architectures and their evolvability. The observation that most random sequences led to expression suggests that the potential of any sequence within the genome to drive expression is enormous and thus 'an important playground for creating new regulatory variability and evolutionary innovation (Galupa, 2023).

This was further supported by the regulatory potential of the genomic sequences that were tested, containing Ubx/Hth motifs; indeed, the results from this work imply that enhancers would more likely evolve from sequences that contain or are biased toward specific motifs (e.g., GATA and Zelda). Perhaps the challenge from an evolutionary perspective has not been what allows expression, but what prevents expression; thus, mechanisms that repress 'spurious' expression might have evolved across genomes. This is in line with propositions that nucleosomal DNA in eukaryotes has evolved to repress transcription, along with transcriptional repressors and other mechanisms such as DNA methylation, as a response (at least partially) to 'the unbearable ease of expression' present in prokaryotes (Galupa, 2023).

The action of such repressive mechanisms could also explain why mutagenesis of developmental enhancers, which are subject to evolutionary selection, does not easily lead to expression outside their native patterns of expression. In sum, the findings of this study raise exciting questions about the evolution of enhancers and the emergence of novel patterns of expression that may underlie new phenotypes, suggesting an underappreciated role for de novo evolution of enhancers by happenstance. Genetic theories of morphological evolution will benefit from comparing controlled, multi-dimensional laboratory experiments with standing variation; such an integrative approach could provide the frameworks that will facilitate making of both transcriptional and evolutionary predictions (Galupa, 2023).

One limitation of this study lies on the numbers - this study has tested a significant number of enhancer variants, but it is still possible that ectopic expression would have been captured more frequently had a larger set of enhancer variants been tested. Also, in principle, a higher number of mutations per enhancer could have also enhanced the likelihood of ectopic expression. Previous work from this lab with the E3N enhancer reported that indeed the proportion of lines with ectopic expression increased with the number of mutations (Galupa, 2023).

However, this increase plateaued around 20%-30% for lines with ~3+ mutations per enhancer and in this study, the number of mutations in the enhancer variants for twiM^PE, rho^NEE, and tin^B ranges from 1 to 7 mutations, so it would be expected to have captured a number of lines with ectopic expression. Importantly, the assay captures millions of years of variation in a controlled setting decoupled from fitness costs. It is also possible that ectopic expression might be present in developmental stages that were not analyzed. Finally, would the results be different if a different promoter had been used? This was not tested formally, but based on published literature, it is believed that using a different promoter would not have major implications in the results observed. Testing a total of enhancer-promoter combinations in human cells, efficiency of enhancers has been shown to be approximately the same irrespective of the type of promoter used, and a recent combinatorial analysis of 1,000 human promoters and 1,000 human enhancers confirmed that most enhancers activate all promoters by similar amounts (Galupa, 2023).

These studies, in cell lines, could only address levels of expression, not spatial patterns-but very recently published results from the lab show that developmental promoters in fly embryos can drive a range of outputs but do not affect spatial aspects of expression, only levels (Galupa, 2023).

In-silico identification and comparison of transcription factor binding sites cluster in anterior-posterior patterning genes in Drosophila melanogaster and Tribolium castaneum

The cis-regulatory data that help in transcriptional regulation is arranged into modular pieces of a few hundred base pairs called CRMs (cis-regulatory modules) and numerous binding sites for multiple transcription factors are prominent characteristics of these cis-regulatory modules. The present study was designed to localize transcription factor binding site (TFBS) clusters on twelve Anterior-posterior (A-P) genes in Tribolium castaneum and compare them to their orthologous gene enhancers in Drosophila melanogaster. Out of the twelve A-P patterning genes, six were gap genes (Kruppel, Knirps, Tailless, Hunchback, Giant, and Caudal) and six were pair rule genes (Hairy, Runt, Even-skipped, Fushi-tarazu, Paired, and Odd-skipped). The genes along with 20 kb upstream and downstream regions were scanned for TFBS clusters using the Motif Cluster Alignment Search Tool (MCAST), a bioinformatics tool that looks for set of nucleotide sequences for statistically significant clusters of non-overlapping occurrence of a given set of motifs. The motifs used in the current study were Hunchback, Caudal, Giant, Kruppel, Knirps, and Even-skipped. The results of the MCAST analysis revealed the maximum number of TFBS for Hunchback, Knirps, Caudal, and Kruppel in both D. melanogaster and T. castaneum, while Bicoid TFBS clusters were found only in D. melanogaster. The size of all the predicted TFBS clusters was less than 1kb in both insect species. These sequences revealed more transversional sites (Tv) than transitional sites (Ti) and the average Ti/Tv ratio was 0.75 (Moudgil, 2023).

Transcriptional coupling of distant regulatory genes in living embryos

The prevailing view of metazoan gene regulation is that individual genes are independently regulated by their own dedicated sets of transcriptional enhancers. Past studies have reported long-range gene-gene associations, but their functional importance in regulating transcription remains unclear. This study used quantitative single-cell live imaging methods to provide a demonstration of co-dependent transcriptional dynamics of genes separated by large genomic distances were found in living Drosophila embryos. Extensive physical and functional associations of distant paralogous genes, including co-regulation by shared enhancers and co-transcriptional initiation over distances of nearly 250 kilobases. Regulatory interconnectivity depends on promoter-proximal tethering elements, and perturbations in these elements uncouple transcription and alter the bursting dynamics of distant genes, suggesting a role of genome topology in the formation and stability of co-transcriptional hubs. Transcriptional coupling is detected throughout the fly genome and encompasses a broad spectrum of conserved developmental processes, suggesting a general strategy for long-range integration of gene activity (Levo, 2022).

Gene regulation is thought to fundamentally differ in prokaryotes and eukaryotes. In the former, tightly clustered genes engaged in a common process are regulated by a shared switch located near the core promoter (e.g., bacterial operons). This type of organization facilitates coordinated transcriptional responses to different environmental stimuli. In higher eukaryotes, individual genes are regulated by multiple enhancers scattered across large genomic distances to produce complex profiles of expression. However, eukaryotic genomes abound with divergent duplicated genes (aka paralogs) that are engaged in common developmental and cellular processes and display overlapping patterns of expression in time and space. These genes are sometimes found in close linear proximity, but are more commonly separated by large distances (20 kb to 250 kb or more). This study explored the possibility that such genes are regulated by shared switches, despite their genomic separation (Levo, 2022).

A surprisingly large fraction of cell fate specification genes in the developing fly embryo are organized as pairs or triplets of distal genes that exhibit overlapping spatiotemporal pattens of expression. Micro-C chromosome conformation capture assays performed during the critical period of cell fate specification (2-3 hrs after fertilization) revealed extensive connectivity between the promoter regions of these genes. Automated analysis of whole genome Micro-C maps identified ~200 long-range focal contacts (i.e. high connectivity between noncontiguous DNA sequences), with nearly half corresponding to promoter-promoter associations (Levo, 2022).

Most of these promoter-promoter contacts correspond to paralogous genes, while a smaller number correspond to widely separated alternative promoters for individual genes. The former class of interconnected genes include a variety of segmentation genes, such as the gap genes knirps-related (knrl)/knirps (kni), the pair-rule genes sloppy-paired 1/2, and the segment polarity genes engrailed/invected. Many dorsal-ventral patterning genes also display this organization, including Dorsocross1/2/3, thisbe/pyramus and scylla (scyl)/charybde (chrb). Interconnected paralogs are also seen for regulatory genes controlling a variety of developmental processes at later stages of the life cycle including neurogenesis and the morphogenesis of adult appendages (e.g., Sox21/Dichaete and bric-a-brac1/2) (Levo, 2022).

This study was able to identify putative shared enhancers for over three-fourths of the inter-connected paralogs displaying overlapping patterns of expression. These enhancers reside in regions of open chromatin and map within 20kb of one of the gene pairs (or trios). In some cases multiple shared enhancers appear to function in an additive pattern to produce composite co-expression profiles, as seen for the segmentation genes slp1 and slp2. It is estimated that 30% of segmentation genes, and at least 11% of all genes showing localized expression in the early embryo, contain distant interconnected paralogs. This long-range coupling challenges the current view of eukaryotic gene regulation, whereby individual genes are controlled by their own dedicated sets of enhancers (Levo, 2022).

To explore the possibility that distant paralogs are coordinately regulated by shared enhancers a comprehensive analyses was conducted of knrl/kni and scyl/chrb, which are regulated by two of the major patterning systems in early embryos, Bicoid (anterior-posterior) and BMP signaling (dorsoventral), respectively. They also possess both common and distinctive properties, such as similarities in overall organization but widely differing genomic distances, 74kb for knrl/kni and 235kb for scyl/chrb. To investigate co-transcriptional gene activity, in time and space, this study employed live single cell transcription imaging. Stem loops were inserted into the respective endogenous transcription units using CRISPR-targeted genome editing. Importantly, homozygous fly lines containing these stem loops are viable, suggesting little impact on the normal activities of the host genes. Simultaneous live transcription imaging in 2-3 hr embryos reveals overlapping expression patterns, and concordant activities within individual nuclei (Levo, 2022).

Quantitative analysis of individual nuclei identified physical proximity of co-expressed transcription foci. Consistent with previously documented distances of ~350nm for long range enhancer-promoter interactions, this study found that knrl and kni are separated by a mean distance of ~320nm, while the more distantly mapping scyl and chrb foci are separated by ~470nm. Nonetheless, these distances are significantly smaller than those seen for uncoupled control genes, both at the population level and for individual nuclei tracked over time (scyl/chrb vs chrb/CG11652. Strikingly, this study detected co-occurring transcriptional initiation events within a time scale of ~90 seconds for both knrl/kni (74kb) and scyl/chrb (235kb). A higher frequency of knrl and kni co-initiation events was observed when the two genes are linked in cis as compared with a trans-homolog arrangement. More generally, both gene pairs show higher frequencies of co-initiation as compared with randomized controls. These observations suggest interconnectivity in the transcriptional dynamics of distant genes (Levo, 2022).

A combination of genome editing, Micro-C contact maps and quantitative live imaging was used to explore the basis for transcriptional co-activation of knrl/kni and scyl/chrb. Shared enhancers were first identified driving localized patterns of expression common to each gene pair; focus was placed on a shared anterior stripe enhancer located upstream of knrl and a shared dorsal midline enhancer located upstream of scyl. For the newly identified anterior stripe enhancer a targeted deletion provides direct evidence that it regulates both the distal kni gene in addition to proximal knrl. Mutant embryos exhibit a loss of both expression patterns in the anterior stripe, and deficiency homozygotes are lethal (Levo, 2022).

The Micro-C maps provide sufficient resolution to distinguish the shared enhancers from the sequences directly underlying long-range focal contacts between gene pairs. The latter sequences contain a distinctive signature of transcription factors (TFs), including Trithorax-like/GAF, CLAMP, and Ph, seen across all interconnected genes. Based on the binding peaks of these TFs within distinct regions of open chromatin, it was possible to subdivide these sequences into a series of discrete elements, that are hereafter designate 'tethering elements'. It is postulated that these elements contribute to physical and functional associations between the promoter regions of interconnected genes. Notably, they do not bind CTCF, although binding is detected in the vicinity of the tethering elements proximal to knrl and scyl. Additionally, tethering elements do not show enhancer activities when attached to reporter genes and tested in transgenic embryos. Targeted replacements of tethering elements (hereafter 'removal') resulted in severely diminished contacts with distal genes, yet did not significantly alter either of the corresponding TADs. Next the transcriptional consequences were considered of removing different tethering elements, beginning with knrl/kni (Levo, 2022).

Removal of the knrl tethering elements resulted in a severe loss of knrl expression, likely due to local effects on promoter function, possibly involving previously established roles of GAF/Trl. More surprisingly, a significant reduction was also observed in kni transcription, 74kb away. A loss of kni activity in the anterior stripe is also seen upon a reciprocal removal of the kni tethering element, although expression in posterior regions governed by kni-proximal enhancers is retained. The targeted removal of the knrl tethering elements does not alter the enhancer sequence, but nonetheless causes a severe loss in viability, approaching the phenotype observed upon removing the enhancer. This phenotype is probably due to reduced kni transcription since deletion of the knrl transcription start site (TSS) produces milder effects. Moreover, diminished viability associated with a large deletion in knrl that removes the shared enhancer, tethering elements, TSS and 5' coding regions, is rescued by inserting the anterior stripe enhancer upstream of kni. This insertion also rescues the loss in transcription that occurs when the kni tethering element is removed. These observations point to a role of promoter-proximal tethering elements in tuning the co-activation of knrl/kni by the shared enhancer over large linear distances. This is supported by genetic complementation experiments, which indicate increased viability of the cis configuration of the shared enhancer and tethering elements as compared with the trans arrangement of regulatory elements (Levo, 2022).

In order to obtain a more detailed understanding of the nature of this long-range tuning quantitative analyses of kni transcription was performed in individual nuclei of live embryos upon removal of knrl tethering elements. While there is only a minor diminishment in transcription levels within active nuclei, a significant reduction was observed in the number of instantaneously active nuclei. This loss appears to be stochastic within the normal limits of the anterior stripe, arising from both a pronounced delay in the onset of kni transcription as well as altered transcriptional bursting dynamics, with reduced durations of active (ON) periods of Pol II release. These observations suggest that enhancer-promoter communication is less stable upon removal of promoter-proximal tethering elements. This view is strengthened by the analysis of the scyl/chrb locus where shared enhancers work over 'vertebrate-style' distances of nearly 250kb (Levo, 2022).

The organization of tethering elements in the 5' scyl regulatory region provided an opportunity to distinguish the activities of enhancer-proximal and promoter-proximal elements. As seen for knrl/kni, removal of both tethers results in a severe loss of scyl transcription, as well as marked reduction in chrb transcription. There is only a modest effect on the levels of chrb transcription in active nuclei, but a massive diminishment in the number of instantaneously active nuclei. Only a third of the expected number of nuclei exhibit chrb transcription throughout the one-hour interval of analysis. Active nuclei display reduced ON periods, as seen for knrl/knrl, but also extended OFF periods, possibly related to the significantly larger distance separating scyl and chrb. The removal of the enhancer-proximal tether results in a selective reduction of chrb transcription without significantly altering scyl transcription. This represents a significant decoupling in the co-transcriptional dynamics of scyl and chrb expression, with a reduced number of co-active nuclei at any given timepoint. These observations lend additional support to the proposal that tethering elements contribute to coordinated expression of distant paralogs (Levo, 2022).

In summary, this study has presented evidence for coordinate regulation of distant genes by shared enhancers. Distant paralogs were shown to interact in 3D over large genomic distances through associations of discrete promoter-proximal tethering elements that underly co-dependent transcriptional dynamics of the interconnected genes. The term 'topological operon' is proposed to highlight co-regulation by shared enhancers, evocative of the shared switches used by bacterial operons (Levo, 2022).

The co-transcriptional dynamics observed within topological operons are consistent with the occurrence of co-transcriptional hubs containing shared pools of transcriptional activators and Pol II. The large distances separating co-transcribing loci and the short timescales of co-initiation events could be manifestations of molecular crowding within shared transcriptional microenvironments. Further support stems from small deletions that impair transcription of the proximal gene and lead to an increase in the transcription of the distal gene (e.g, knrl TSS or scyl tether. These could reflect instances of promoter competition for shared but limiting transcriptional resources within a common hub (Levo, 2022).

While this study has emphasized co-activation, topological operons might also foster co-repression of interconnected genes in inactive tissues since tethering elements often bind subunits of the PRC1 Polycomb complex. Furthermore, long-range connectivity within topological operons appear to afford a greater degree of regulatory flexibility than that permitted by polycistronic genes within bacterial operons. For example, kni is regulated in the presumptive abdomen by nearby enhancers that produce only weak and sporadic activation of knrl. Consistent with recent studies suggesting a general maintenance of long-range associations across tissues, this study found physical proximity of co-expressed transcription foci in the anterior stripe and abdominal domains. It is conceivable that even subtle changes in 3D organization are sufficient to mediate distinct modes of co-regulation in different tissues. This regulatory flexibility is also seen for other cases of long-range associations (e.g., globin42 and HoxD43), and might reflect the greater demands imposed by complex cell types (Levo, 2022).

Topological operons account for a substantial fraction of gene activity in the early Drosophila embryo. They also account for a variety of developmental processes during later stages of the Drosophila life cycle. Many of these genes have known orthologs in vertebrates, including those regulating the patterning of the central nervous system (ac, D, en, ems), eye development (Vsx2), TOR signaling (scylla), cardiovascular development (H15) and morphogenesis of adult appendages (bab1/2) (Levo, 2022).

Several recent studies have uncovered widespread gene-gene associations in different human tissues, including distant paralogs. They share a strong correlation in chromatin modifications and are enriched for matching eQTLs, raising the possibility that they may be transcriptionally coupled as seen in this study. Identification of promoter-proximal tethering elements, distinct from enhancers, provides a new perspective for cross-regulatory influences of distant promoters. The contributions of tethering elements to long-range promoter coupling and enhancer-promoter interactions in Drosophila also provide a foundation for the characterization of comparable elements in vertebrates (Levo, 2022).

Topological operons might not be restricted to paralogous genes, and it remains to be seen whether they also interconnect unrelated genes encoding different components of common biological pathways, as seen for bacterial operons. It is anticipated that topological operons are likely to be a general feature of metazoan genomes, providing a strategy to integrate and coordinate the activities of distant regulatory genes engaged in complex cellular and developmental processes (Levo, 2022).

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers

Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. This study built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. These rules were evaluated experimentally, and it was demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, synthetic enhancers with desired activities de novo were designed and functionally validated (de Almeida, 2022).

Deciphering the rules governing the relationship between enhancer sequence and function - typically called the cis-regulatory code of enhancers - has remained a long- standing open problem. It has proved so challenging because methods to functionally characterize large numbers of enhancers have only become available a few years ago and also because the cis-regulatory code, unlike the protein-coding genetic code, follows complex and cell type-specific sequence-rules (de Almeida, 2022).

To dissect the relationship between enhancer sequence and activity for a single model cell type, a deep learning model, DeepSTARR was built, that accurately predicts enhancer activity for two different transcriptional programs directly from DNA sequence. DeepSTARR learned important TF motif types and higher-order syntax rules: different instances of the same TF motif are not functionally equivalent, and the differences are determined by motif flanks and inter-motif distances. These types of rules are also important in human enhancers and will be relevant to predict the impact of genetic variants linked to disease in the human genome (de Almeida, 2022).

The discovery that relatively rare sequence features can be important and predictive of enhancer activity is important and unexpected and highlights the potential of unbiased deep learning models that are not based on over-representation. The fact that motifs are often not arranged in optimal syntax agrees with previous work that suggested that suboptimal enhancers might have evolved to allow cell type specificity. Consistent with this interpretation,optimized sequences of housekeeping enhancers were observed that operate in all cell types (de Almeida, 2022).

The results reveal an underappreciated property of enhancers: identical instances of the same TF motif with non-equivalent contributions to enhancer activity. Although the observation that only a small fraction of potential motifs throughout the genome is actually bound suggests that motif instances cannot all be equivalent, the non- equivalence of motif instances within the same enhancer is surprising. In fact, previous studies and computational models have typically considered different motif instances solely according to their PWM scores or even as equivalent. The contribution of motif instances depended on high-order motif syntax rules such as inter-motif distances that are not captured by traditional PWM models and need to be modelled within the full enhancer sequence. This is in line with the recently reported limitations of PWM models for predicting the effects of noncoding variants on TF binding in vitro and improved performance of deep learning models for the prediction of motif instances bound in vivo. Together these results suggest that motif instances need to be analysed within their cis-regulatory context, which should improve the ability to predict and interpret the impact of disease-related sequence variants that typically affect individual motif instances (de Almeida, 2022).

The rules learned by DeepSTARR allowed the de novo design of synthetic S2 cell enhancers with desired activity levels, which not only demonstrates the validity of the model and its rules but also illustrates the power of this approach. Although libraries of synthetic elements have been used to explore enhancer structure, it has remained impossible to build fully synthetic sequences with specific characteristics. It is interesting how these synthetic enhancers are of similar complexity as endogenous enhancers, e.g. in terms of motif number and diversity, and that a vast number of different sequences can have similar enhancer strengths, highlighting regulatory sequence flexibility and evolutionary opportunities. It is expected that combining DeepSTARR with emerging algorithms that allow the direct generation of DNA sequences from deep learning models will provide unanticipated opportunities for the engineering of synthetic enhancers (de Almeida, 2022).

A next key challenge for the field will be to generalize such models from individual deeply characterized model cell lines to all cell types of an organism or even across species. This task is challenging because enhancers form the basis of differential gene transcription, and their activities are inherently cell-type specific. The underlying sequences and rules must therefore - by definition - also differ between cell types, at least to some extent. It is well known for example that enhancers that are active in different cell types or tissues contain different TF motifs, which enables the binding of cell type-specific TFs. Therefore, it remains unclear how and to what extent cis-regulatory rules generalize or even apply universally (de Almeida, 2022).

This study shows that differences between motif instances as well as the importance of motif flanks and distances generalize from Drosophila to human enhancers (de Almeida, 2022).

Unexpectedly, for AP-1 motifs, which were assessed in both species, the Drosophila- trained DeepSTARR model was able to predict the importance of AP-1 instances in human enhancers and in both species ETS-AP-1 pairs synergize only at short distances but not at longer ones. Ultimately, this demonstrates that although the specific rules vary between TF motif types and motif combinations, the types of rules as well as some specific rules apply more generally. Dissecting important types of rules in model cell lines together with the wealth of genomic data across many cell types (such as those from ENCODE) should unveil the gene- regulatory information in genomes and a general cis-regulatory code (de Almeida, 2022).

Enhancers display constrained sequence flexibility and context-specific modulation of motif function

The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remains poorly understood. This study explored the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: (1) important TF motifs were replaced by all possible 65,536 eight-nucleotide-long sequences and (2) eight important TF motif types were pasted into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but these are only a fraction of all possible sequences and motif types. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, the presence and diversity of other motif types, and the distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers, as was demonstrated experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution, and in disease (Reiter, 2023).

This study used two complementary strategies to explore the flexibility of enhancers with regard to nucleotide and motif identity at specific enhancer positions as well as the position dependence of motif activity. Even though median enhancer activity drops significantly when randomizing an 8-nt stretch at important positions, many sequence variants, including variants of the wild-type motif but also other TF motifs, can achieve strong enhancer activity. The diverse solutions at each position show that enhancers exhibit some degree of flexibility. However, as only a few hundred out of the >65,000 tested sequences work, the flexibility at any given position is constrained. Similarly, systematically pasting different motifs into hundreds of enhancer positions revealed that motif activity is strongly modulated by the enhancer sequence context. Therefore, constrained sequence flexibility and the modulation of motif function by the sequence context seem to be key features of enhancers (Reiter, 2023).

The observation that both Drosophila and human TF motifs require specific enhancer sequence contexts suggests that this is a general principle of enhancers. Even though motifs possess some intrinsic strengths, their potential to activate transcription strongly depends on the sequence context and follows certain syntax rules, including motif flanks, combinations, and distances. Although our study cannot assess the mechanistic causes for these rules, they might be related to local DNA shape or to more general enhancer DNA properties such as DNA bending. Our observation that homotypic interactions of certain motifs at close distances (e.g., GATA or ETS) are negatively associated with enhancer activity is consistent with repressive homotypic interactions between pluripotency TFs found by thermodynamic modeling; the mechanisms, however, are still unclear. Intermotif distances can impact the synergy between TFs at the level of DNA binding or after binding, such as cofactor recruitment and activation, which could explain both positive and negative TF-TF interactions. Although these syntax rules seem to be stricter for some TF motifs (e.g., GATA) and more relaxed for others (e.g., P53), our results show that motifs are not simply independent modules. Instead, they interact with all enhancer features in a highly cooperative manner, which can modulate motif activity by more than 100-fold. This is an important result that supports a model where enhancer activity is encoded through a complex interdependence between motifs and context, rather than motifs acting independently and additively. Whereas tissue- or cell type–specificity can already be predicted by motif presence-absence patterns alone, the encoding of different enhancer strengths seems to depend on more complex cis-regulatory syntax rules. The functional implications of mutations in TF motifs or elsewhere within enhancer sequences can therefore only be assessed in the context of these syntax features (Reiter, 2023).

The motif syntax rules described here agree well with the ones learned by DeepSTARR trained on genome-wide enhancer activity data and the BPNet model trained on endogenous TF binding and cooperativity, suggesting that these rules are important in wild-type enhancer sequences. As an ectopic reporter assay STARR-seq measures the potential of sequences to act as enhancers, even if the sequences might be repressed endogenously at the chromatin level, making it a powerful tool to uncover the sequence determinants for enhancer activity. It will be interesting to explore the sequence rules and mechanisms by which chromatin modulates endogenous enhancer activities and gene expression using complementary methods. In addition, DeepSTARR also predicted with good accuracy the activity of all randomized sequence variants and of motifs pasted in different enhancer contexts. This supports the validity of computational models such as DeepSTARR and their use in in-silico-like experiments (e.g., motif pasting experiments with a larger set of TF motifs across many more genomic positions) to improve our understanding of the regulatory information encoded in enhancer sequences and the impact of mutations (Reiter, 2023).

This study shows that enhancer sequences are flexible enough for enhancer strength to be achieved by a small yet diverse set of sequence variants, and that mutations in information-poor positions have little impact on the enhancer activity in a single cell type. This flexibility allows many different sequences to achieve similar enhancer activities in a single cell type, which might be an important prerequisite for the evolution of developmental enhancers that operate under many additional constraints, for example, regarding the precise spatiotemporal control of enhancer activities. As the activity in a given cell can be achieved by many solutions, the specific solutions that fulfill additional requirements can be explored during evolution. Indeed, previous studies that have analyzed expression changes of enhancer mutations across different cell types in vivo have observed that the cell type–specific expression patterns of enhancers can change upon (minimal) sequence perturbations. The fact that enhancer strength in any given cell type and enhancer specificity across cell types and developmental time are subject to different yet overlapping sequence constraints highlights the complexity of the regulatory code. It is expected that the combination of quantitative enhancer-sequence-to-function models in individual cell types and qualitative predictions of enhancer activities across cell types will provide unprecedented progress in understanding of enhancer biology and our ability to read and write enhancer sequences (Reiter, 2023).

Enhancer of trithorax/polycomb, Corto, regulates timing of hunchback gene relocation and competence in Drosophila neuroblasts

Neural progenitors produce diverse cells in a stereotyped birth order, but can specify each cell type for only a limited duration. In the Drosophila embryo, neuroblasts (neural progenitors) specify multiple, distinct neurons by sequentially expressing a series of temporal identity transcription factors with each division. Hunchback (Hb), the first of the series, specifies early-born neuronal identity. Neuroblast competence to generate early-born neurons is terminated when the hb gene relocates to the neuroblast nuclear lamina, rendering it refractory to activation in descendent neurons. Mechanisms and trans-acting factors underlying this process are poorly understood. This study identified Corto, an enhancer of Trithorax/Polycomb (ETP) protein, as a new regulator of neuroblast competence. The GAL4/UAS system was used to drive persistent misexpression of Hb in neuroblast 7-1 (NB7-1), a model lineage for which the early competence window has been well characterized, to examine the role of Corto in neuroblast competence. immuno-DNA Fluorescence in situ hybridization (DNA FISH) was used in whole embryos to track the position of the hb gene locus specifically in neuroblasts across developmental time, comparing corto mutants to control embryos. Finally, immunostaining was used in whole embryos to examine Corto's role in repression of Hb and a known target gene, Abdominal B (Abd-B). In corto mutants, the hb gene relocation to the neuroblast nuclear lamina was found to be delayed and the early competence window is extended. The delay in gene relocation occurs after hb transcription is already terminated in the neuroblast and is not due to prolonged transcriptional activity. Further, it was found that Corto genetically interacts with Posterior Sex Combs (Psc), a core subunit of polycomb group complex 1 (PRC1), to terminate early competence. Loss of Corto does not result in derepression of Hb or its Hox target, Abd-B, specifically in neuroblasts. These results show that in neuroblasts, Corto genetically interacts with PRC1 to regulate timing of nuclear architecture reorganization and support the model that distinct mechanisms of silencing are implemented in a step-wise fashion during development to regulate cell fate gene expression in neuronal progeny (Hafer, 2022).

Cooperative binding between distant transcription factors is a hallmark of active enhancers

Enhancers harbor binding motifs that recruit transcription factors (TFs) for gene activation. While cooperative binding of TFs at enhancers is known to be critical for transcriptional activation of a handful of developmental enhancers, the extent of TF cooperativity genome-wide is unknown. This study coupled high-resolution nuclease footprinting with single-molecule methylation profiling to characterize TF cooperativity at active enhancers in the Drosophila genome. Enrichment of short micrococcal nuclease (MNase)-protected DNA segments indicates that the majority of enhancers harbor two or more TF-binding sites, and this study uncovered protected fragments that correspond to co-bound sites in thousands of enhancers. From the analysis of co-binding, this study found that cooperativity dominates TF binding in vivo at the majority of active enhancers. Cooperativity is highest between sites spaced 50 bp apart, indicating that cooperativity occurs without apparent protein-protein interactions. These findings suggest nucleosomes promote cooperativity because co-binding may effectively clear nucleosomes and promote enhancer function (Rao, 2021).

This study exploited MNase-resistant protections of chromatin to detect bound proteins at high resolution and infer the regulatory architecture of enhancers. Enhancers have been thought to have poorly positioned nucleosomes, perhaps corresponding to weak initiation of transcription within elements, but this study found that alignment of active enhancers by the factor-protected regions within them resolves chromatin features, revealing that enhancers—-like active promoters—-are structured and have defined nucleosome-depleted regions (NDRs). Notably, while factor-protected regions within enhancers often encompass recognizable consensus motifs for known TFs, many features and even elements lack any statistically significant motif. As the Drosophila TF repertoire has been extensively characterized, this highlights that the rules dictating factor binding in vivo remain incomplete. Other aspects of chromatin beyond the sequences directly contacted by TF DNA-binding domains must promote the recognition and effective binding of regulatory sites (Rao, 2021).

Information that guides factor binding may come from DNA conformation around binding sites. Additionally, cooperativity between multiple TFs in a regulatory element can increase affinity and specificity for weaker consensus motifs. TFs juxtaposed on a regulatory element might also enhance the affinity of each factor to DNA in vitro, but given the fast transient binding of factors in vivo, it has not been clear how widespread factor cooperativity is. While multiple TFs do bind independently at some active enhancers, 64% of active enhancers in the fly genome display substantial degrees of factor cooperativity. These cooperative interactions are not due to dimeric factors, since the cases that were identified occur between factors that bind regulatory elements that are >30 bp apart. In some cases, cooperativity occurs between factors as far apart as 140 bp. Such long-distance synergies might be due to interacting factors that bridge distant sites or due to effects of nucleosome positioning (Rao, 2021).

TF cooperativity correlated with nucleosome occupancy and histone turnover at active enhancers, implying antagonism between TFs and histones for DNA. In the context of chromatin, binding of multiple spaced TFs competes with nucleosome formation. In dynamic nucleosomes where DNA is being unwrapped and rewrapped across the surface of a histone octamer, binding of TFs at exposed DNA can block rewrapping of octamers. The efficiency of blocking the restoration of a nucleosome depends on the relative positioning of factor binding sites, where multiple binding sites on one side of a nucleosome are better competitors. The observation that factor cooperativity occurs predominantly between sites spaced 50 bp apart in active enhancers fits with this idea. In this line of thinking, an important aspect of factor binding site grammars may be loosely constrained arrangements of sites that primarily act to destabilize nucleosomes. As many TFs recruit chromatin remodeling enzymes to their binding sites, catalyzed displacement of nucleosomes may also contribute to cooperative occupancy of regulatory elements (Rao, 2021).

A striking but unexplained observation in single-molecule profiling of eukaryotic chromatin is that factor binding sites are not bound by a cognate factor or occluded in a nucleosome up to ~25% of the time. These observations agree well with single-molecule tracking experiments that show only a small fraction of TFs to be bound stably to chromatin and most TFs to have a short residence time in the order of seconds at stably bound sites. Future experiments to directly measure the turnover time of TFs at their binding sites by high-resolution live imaging, by techniques like SNAP-tagging and Anchor-away, can uncover the underlying basis for such high levels of unbound states. How do regulatory elements function when no TF is bound? While TFs may often be absent from a regulatory element, part of the answer may lie in that restoration of nucleosomes is slow compared to the binding and release of factors in an active regulatory element. With slow nucleosomal restoration, transient binding of factors maintains a regulatory element in an exposed configuration where factors can cycle on and off. The persistence of histone modifications on flanking nucleosomes may similarly provide a short-term memory of regulatory events when factors are not bound. In these ways, nucleosome dynamics may provide a mechanism to temper stochastic effects of transient factor binding in vivo and thus provide stable regulatory output to direct gene expression (Rao, 2021).

In this study, TF binding was determined using two orthogonal methods: MNase-seq and dSMF. Mapping of TF binding by both these methods depends on the occupancy of a site, how tightly it is bound by a TF, and how effectively the TF protects underlying DNA—factors that are common to most genomic methods that map TF binding. Pairs of TFBSs separated by distances starting at 30 bp were studied because the methods used cannot resolve individual binding events at <30 bp. Future studies could determine if cooperativity between TFs is even stronger at distances shorter than 30 bp, with and without protein-protein interactions (Rao, 2021).

An insulator blocks access to enhancers by an illegitimate promoter, preventing repression by transcriptional interference

Several distinct activities and functions have been described for chromatin insulators, which separate genes along chromosomes into functional units. This paper describes a novel mechanism of functional separation whereby an insulator prevents gene repression. When the homie insulator is deleted from the end of a Drosophila even skipped (eve) locus, a flanking P-element promoter is activated in a partial eve pattern, causing expression driven by enhancers in the 3' region to be repressed. The mechanism involves transcriptional read-through from the flanking promoter. This conclusion is based on the following. Read-through driven by a heterologous enhancer is sufficient to repress, even when homie is in place. Furthermore, when the flanking promoter is turned around, repression is minimal. Transcriptional read-through that does not produce anti-sense RNA can still repress expression, ruling out RNAi as the mechanism in this case. Thus, transcriptional interference, caused by enhancer capture and read-through when the insulator is removed, represses eve promoter-driven expression. We also show that enhancer-promoter specificity and processivity of transcription can have decisive effects on the consequences of insulator removal. First, a core heat shock 70 promoter that is not activated well by eve enhancers did not cause read-through sufficient to repress the eve promoter. Second, these transcripts are less processive than those initiated at the P-promoter, measured by how far they extend through the eve locus, and so are less disruptive. These results highlight the importance of considering transcriptional read-through when assessing the effects of insulators on gene expression (Fujioka, 2021).

Deep learning connects DNA traces to transcription to reveal predictive features beyond enhancer-promoter contact

Chromatin architecture plays an important role in gene regulation. Recent advances in super-resolution microscopy have made it possible to measure chromatin 3D structure and transcription in thousands of single cells. However, leveraging these complex data sets with a computationally unbiased method has been challenging. This study presents a deep learning-based approach to better understand to what degree chromatin structure relates to transcriptional state of individual cells. Furthermore, methods were explored to "unpack the black box" to determine in an unbiased manner which structural features of chromatin regulation are most important for gene expression state. This approach was applied to an Optical Reconstruction of Chromatin Architecture dataset of the Bithorax gene cluster in Drosophila; it was shown to outperforms previous contact-focused methods in predicting expression state from 3D structure. The structural information is distributed across the domain, overlapping and extending beyond domains identified by prior genetic analyses. Individual enhancer-promoter interactions are a minor contributor to predictions of activity (Rajpurkar, 2021).

The hourglass model of evolutionary conservation during embryogenesis extends to developmental enhancers with signatures of positive selection

Inter-species comparisons of both morphology and gene expression within a phylum have revealed a period in the middle of embryogenesis with more similarity between species compared to earlier and later time-points. This "developmental hourglass" pattern has been observed in many phyla, yet the evolutionary constraints on gene expression, and underlying mechanisms of how this is regulated, remains elusive. Moreover, the role of positive selection on gene regulation in the more diverged earlier and later stages of embryogenesis remains unknown. Using DNase-seq to identify regulatory regions in two distant Drosophila species (D. melanogaster and D. virilis), this study assessed the evolutionary conservation and adaptive evolution of enhancers throughout multiple stages of embryogenesis. This revealed a higher proportion of conserved enhancers at the phylotypic period, providing a regulatory basis for the hourglass expression pattern. Using an in silico mutagenesis approach, signatures of positive selection on developmental enhancers were detected at early and late stages of embryogenesis, with a depletion at the phylotypic period, suggesting positive selection as one evolutionary mechanism underlying the hourglass pattern of animal evolution (Liu, 2021).

Identification of genomic enhancers through spatial integration of single-cell transcriptomics and epigenomics

Single-cell technologies allow measuring chromatin accessibility and gene expression in each cell, but jointly utilizing both layers to map bona fide gene regulatory networks and enhancers remains challenging. This study generated independent single-cell RNA-seq and single-cell ATAC-seq atlases of the Drosophila eye-antennal disc and spatially integrate the data into a virtual latent space that mimics the organization of the 2D tissue using ScoMAP (Single-Cell Omics Mapping into spatial Axes using Pseudotime ordering). To validate spatially predicted enhancers, a large collection of enhancer-reporter lines and identify ~85% of enhancers in which chromatin accessibility and enhancer activity are coupled. Next, infer enhancer-to-gene relationships were inferred in the virtual space, finding that genes are mostly regulated by multiple, often redundant, enhancers. Exploiting cell type-specific enhancers, cell type-specific effects of bulk-derived chromatin accessibility QTLs were deconvoluted. Finally, Prospero was found to drive neuronal differentiation through the binding of a GGG motif. In summary, a comprehensive spatial characterization of gene regulation is provided in a 2D tissue (Bravo González-Blas, 2020).

Cellular identity is defined by Gene Regulatory Networks (GRNs), in which transcription factors bind to enhancers and promoters to regulate target gene expression, ultimately resulting in a cell type-specific transcriptome. Single-cell technologies provide new opportunities to study the mechanisms underlying cell identity. Particularly, single-cell transcriptomics allow measuring gene expression in each cell, while single-cell epigenomics, such as single-cell ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing), serves as a read-out of chromatin accessibility. Although these technologies and computational approaches are recently evolving to include spatial information, most approaches currently target single-cell transcriptomes. It remains a challenge how to exploit single-cell epigenomic data for resolving spatiotemporal enhancer activity and GRN dynamics, both experimentally and computationally (Bravo González-Blas, 2020).

In addition, while ATAC-seq is a powerful tool for predicting candidate enhancers, not all accessible regions correspond to functionally active enhancers. For example, accessible sites can correspond to ubiquitously accessible promoters or binding sites for insulator proteins; to repressed or inactive regions due to binding of repressive transcription factors; or to primed regions that are accessible across a tissue, but become only specifically activated in a subset of cell types. Importantly, single-cell ATAC-seq has not been fully exploited to explore these aspects yet. While most scATAC-seq studies have been carried out in mammalian systems, in which enhancer testing is not trivial, Cusanovich (2018) evaluated 31 cell type-specific enhancers predicted from scATAC-seq in the Drosophila embryo, finding that ~ 74% showed the expected activity patterns (Bravo González-Blas, 2020).

Another current challenge in the field of single-cell regulatory genomics is how to integrate epigenomic and transcriptomic information. Although some experimental approaches have been developed for profiling both the epigenome and the transcriptome of the same cell, currently either the quality of the measurements, or the throughput, is still significantly lower compared to each independent single-cell assay. For example, sci-CAR (Single-cell Combinatorial Indexing Chromatin Accessibility and mRNA) or SNARE-seq (Single-Nucleus Chromatin Accessibility and mRNA Expression sequencing) on human cells achieved a median of 1,000-4,000 UMIs (Unique Molecular Identifiers) and 1,500-3,000 fragments per cell, while the coverage with non-integrative methods, such as 10x, is around 20,000 UMIs and 10,000 fragments per cell for scRNA-seq and scATAC-seq, respectively. Methods that achieve high sensitivity, such as scCAT-seq (single-cell Chromatin Accessibility and Transcriptome sequencing), are based on microwell plates rather than droplet microfluidics, making their throughput limited (Bravo González-Blas, 2020).

Given the current limitations of combined omics methods, the computational integration of independent high-sensitivity assays provides a valuable alternative. For example, Seurat and Liger have been used to integrate independently sequenced single-cell transcriptomes and single-cell epigenomes. Nevertheless, these methods require the 'conversion' of the genomic region accessibility matrix into a gene-based matrix, and how to perform such a conversion is an unresolved issue. Some studies have used the accessibility around the Transcription Start Site (TSS) as proxy for gene expression (Bravo González-Blas, 2019); others aggregate the accessibility regions that are co-accessible (i.e., correlated) with the TSS of the gene in a certain space (Pliner et al., 2018). However, promoter accessibility is not always correlated with gene expression. Furthermore, enhancers can be located very far from their target genes-upstream or downstream, up to 1 Mbp in mammalian genomes, or up to 100-200 kb in Drosophila , often with intervening non-target genes in between-and relationships between enhancers and target genes are often not one-to-one (i.e., an enhancer can have multiple targets, and a gene can be regulated by more than one enhancer) (Shlyueva et al., 2014). Enhancer-promoter interactions can also be predicted using Hi-C approaches at the bulk level (Ghavi-Helm et al , 2019); however, these methods have limited sensitivity at single-cell resolution (Bravo González-Blas, 2020).

The Drosophila third-instar larval eye-antennal disc provides an ideal biological system for the spatial modeling of gene regulation at single-cell resolution. The eye-antennal disc comprises complex, dynamic, and spatially restricted cell populations in two dimensions. The antennal disc consists of four concentric rings (A1, A2, A3, and arista), each with a different transcriptome and different combinations of master regulators. For example, both Hth and Cut regulate the outer antennal rings (A1 and A2), with additional expression of Dll in A2, while Dll, Ss, and Dan/Danr are key for the development of the inner rings (A3 and arista), among others. On the other hand, a continuous cellular differentiation process from anterior to posterior occurs in the eye disc, in which progenitor cells differentiate into neuronal (i.e., photoreceptors) and non-neuronal (i.e., cone cells, bristle, and pigment cells) cell types. This differentiation wave is driven by the morphogenetic furrow (MF). Posterior to the MF, R8 photoreceptors are specified first, and then, they sequentially recruit R2/R5, R3/R4, and R7 photoreceptors and cone cells to form hexagonally packed units called ommatidia. In summary, the heterogeneity of cell types and differentiation trajectories results in diverse-static and dynamic-GRNs, which can be modeled with a combination of experimental and computational approaches (Bravo González-Blas, 2020).

This work first generated a scRNA-seq and a scATAC-seq atlas of the eye-antennal disc. Second, taking advantage of the fact that the disc proper is a 2D tissue, these single-cell profiles were spatially mapped on a latent space that mimics the eye-antennal disc, called the virtual eye-antennal disc. Next, by exploiting publicly available enhancer-reporter data, the relationship between enhancer accessibility and activity was assessed. Third, these virtual cells, for which both epigenomic and transcriptomic data are available, were used to derive links between enhancers and target genes using a new regression approach. Fourth, a panel of 50 bulk ATAC-seq profiles across inbred lines was used to predict cell type-specific caQTLs (chromatin accessibility QTLs). Finally, these findings were used to characterize the role of Prospero in the accessibility of photoreceptor enhancers. In summary, a comprehensive characterization is provided of gene regulation in the eye-antennal disc, using a strategy that is applicable to other tissues and organisms. These results can be explored as a resource on SCope and the UCSC Genome Browser (), and an R package, called ScoMAP (Single-Cell Omics Mapping into spatial Axes using Pseudotime ordering) is provided, to spatially integrate single-cell omics data and infer enhancer-to-gene relationships (Bravo González-Blas, 2020).

This work presents a semi-supervised approach to map omics data into a virtual template by extracting axial information via pseudotime ordering, available as an R package called ScoMAP. The main limitations of this approach are that (1) it can be currently only applied to 1D or 2D tissues, (2) it requires a priori information about at least one landmark between the real and the virtual cells and the direction of the axis, and (3) it assumes symmetry around the axes, meaning that other gradients may be lost as cells are spread randomly in each bin. Nevertheless, the spatial gene expression atlas resulting from the mapping of scRNA-seq accurately recapitulates known gene expression patterns and allows to generate virtual gene expression profiles for any gene, at a resolution comparable with novoSpaRc (Bravo González-Blas, 2020).

Whereas spatial inference has been reported based on scRNA-seq data, this work generates the first spatial map of a tissue from scATAC-seq data. This accessibility atlas effectively predicts enhancer-reporter activity for more than 700 enhancers from the Janelia FlyLight Project, with ~85% of enhancers showing matching accessibility and activity patterns. The remaining enhancers (~ 15%) are binding sites of the epithelial pioneer transcription factor Grainyhead, which primes these regions in all the epithelial cells without resulting in enhancer activity. Indeed, pioneer transcription factors are able to displace nucleosomes, resulting in an ATAC-seq signal; and despite that they are necessary, their binding is not sufficient for activity (Jacobs, 2018). Thus, enhancer accessibility can be achieved either by the binding of pioneer factors or through the cooperative binding of multiple TFs. These results highlight both the power of using scATAC-seq as a proxy of enhancer activity and the need for caution when dealing with pioneer factors (Bravo González-Blas, 2020).

The virtual map also acts as a latent space in which scATAC-seq and scRNA-seq data are available for each virtual cell. While experimental approaches for the simultaneous profiling of epigenome and transcriptome are emerging, these do not achieve the same throughout and sensitivity compared with the independent assays yet. Computationally, Granja (2019) has taken a similar approach, in which cells are mapped into the same latent space and for each single-cell transcriptome, the aggregate scATAC-seq profile of the closest neighbors is assigned. The resulting integrated profiles allow inferring relationships between enhancers and target genes. While Pliner (2018) has tackled this problem uniquely using scATAC-seq data, Granja (2019) used Pearson correlation between the chromatin accessibility and gene expression. This work extends this approach by also using random forest models to assess non-linear relationships. Of note, these approaches are not robust to pioneer sites, whose accessibility and activity are unpaired. For example, in the current approach a validated intronic enhancer of Atonal and Grainyhead in sca is missed, as the enhancer is ubiquitously accessible while only functional in the morphogenetic furrow, where the gene is expressed. Nevertheless, for the remaining 85% of the enhancers in which accessibility and activity are coupled, in this system, this study has been able to reconstruct novel and validated enhancer-to-target gene links (Bravo González-Blas, 2020).

The predicted links between enhancers and target genes support that (1) the probability of an enhancer regulating a gene decreases exponentially with the distance and the number of non-intervening genes in between, as also reported by others, and (2) genes are regulated by several-and in some cases, redundant-enhancers, with a median of 22 enhancers linked to each gene. Indeed, Cannavo (2016) reported in the Drosophila embryo that ~64% of the mesodermal loci have redundant (or shadow) enhancers, of which ~ 60% contain more than one pair of shadow enhancers. In agreement, this study finds that ~80% of the genes are regulated by shadow enhancers (6,937 out of 8,307 genes), out of which ~72% are regulated by at least three shadow enhancers (4,900 out of 6,937 genes). Transcription factors are more tightly regulated, being linked with a higher number of enhancers (with an average of 13 positive links per gene) and having almost twice the number of redundant enhancers compared with non-TFs genes. As abnormalities in the expression of transcription factor genes can have more severe phenotypes compared with final effector genes, having more-and redundant-enhancers may provide evolutionary robustness. In addition, the majority of shadow enhancers are partially redundant, meaning that they can be uniquely essential on other developmental stages or tissues, or under adverse environmental conditions (Bravo González-Blas, 2020).

Of note, almost ~50% of the inferred links are negatively correlated with their target genes. While polycomb-mediated repression has been shown to reduce region accessibility, other studies suggest that, although repressed enhancers are less accessible than active enhancers, they still show accessibility compared with the non-regulatory genome (Bozek, 2019). Such effect can be observed in the embryonic eve stripe 2 enhancer, which is active (and more accessible) in the second embryonic stripe, while repressed (and less accessible) in the rest. Meanwhile, in the eye-antennal disc, where it is not active nor repressed, there is no accessibility. Thus, accessible regions do not only correspond to primed or active enhancers, promoters, and insulators, but also to repressed enhancers (Bravo González-Blas, 2020).

Several works have focused on the inference of GRNs from single-cell data, mostly exploiting scRNA-seq to infer co-expression patterns between TFs and potential target genes. In an attempt to reduce the number of false-positive targets due to activating cascade effects, SCENIC, which additionally evaluates the enrichment of binding sites for the TF around the TSS of the putative target gene, was introduced. On the other hand, other studies have exploited single-cell ATAC-seq to find target enhancers with binding sites for specific TFs. For example, chromVAR aggregates regulatory regions based on motif enrichment and then evaluates these modules on single-cell ATAC-seq data, while cisTopic (Bravo Gonzalez-Blas, 2019) performs motif enrichment on sets of co-accessible enhancers inferred from scATAC-seq profiles (i.e., topics) to find common master regulators. However, none of these approaches incorporates knowledge about the TF nor target gene expression. This study has aimed to integrate all these layers-transcription factor binding sites, chromatin, and gene expression-to infer GRNs, by deriving co-expression modules between genes and transcription factors (from the scRNA-seq data) and pruning them based on the enrichment of the TF motif in the enhancers that regulate these genes (based on the enhancer-to-target gene links derived from the integration of scATAC and scRNA-seq data). Such networks de facto have enhancers, rather than genes, as nodes (i.e., TF-Enhancer-Gene networks) (Bravo González-Blas, 2020).

As bulk profiles may mask true biological signal (due to the proportions of the different cell types), single-cell data have been used to deconvolute cell type-specific signals from bulk RNA-seq data, permitting to exploit large cohorts with bulk omics data, complemented with only one single-cell reference atlas. This study has investigated the impact of genomic variation on cell type-specific enhancers. For example, the relevance was revealed of Atonal binding sites for opening Johnston's organ precursor-specific regions and the GGG motif, previously unlinked to any transcription factor, for opening photoreceptor regions. Interestingly, Atonal has been shown to be a key transcription factor for the specification of sensory neuronsand bHLH proteins have been proposed to act as pioneer transcription factors in certain contexts, such as the mammalian family member Ascl1 (Bravo González-Blas, 2020).

The importance of the GGG motif in neuronal enhancers was evident in most of the current analyses; however, its interpretation was a challenge because the binding TFs were unknown. While yeast one-hybrid (Y1H) experiments have been previously used to reverse-engineer which transcription factors can bind a motif of interest, lowly expressed TFs may be underrepresented in the cDNA library and interactions that occur in vivo may be missed (such as those dependent of post-transcriptional modifications). This study has used a novel in vivo approach, in which the changes that overexpression of potential TF candidates causes in chromatin accessibility at the bulk ATAC-seq level were evaluated. Although this strategy allows to characterize the effects of TF overexpression directly on the tissue of interest, it also has limitations, such as the limited throughput of in vivo genetic screens (one TF per experiment, compared to dozens of TFs that can be tested by Y1H or Perturb-ATAC in vitro). This requires making a stringent selection of potential candidates that can be further bounded by the existence of compatible tools, such as UAS-TF lines. In addition, the changes in chromatin may not be direct, but these effects can be partially ruled out using external data available, such as ChIP-seq (Bravo González-Blas, 2020).

This study found that the neuronal precursor transcription factor Prospero acts as the strongest binder of the GGG motif, followed by Nerfin-1 and l(3)neo38. In fact, overexpression of each of them, but especially Prospero, results in the opening of GGG regions; and all three transcription factors, especially Pros and Nerfin-1, can bind to the GGG motif. Based on the expression of these transcription factors, it is hypothesized that Nerfin-1-and l(3)neo38-are the early binders of the GGG motif, while Pros can bind to these regions in the late-born photoreceptors, where it is expressed. In fact, Pros and Nerfin-1 have been reported to share direct targets during CNS differentiation and have been found to be key regulators during the photoreceptor and retinal differentiation in other organisms, such as zebrafish, chicken, and mammals (Bravo González-Blas, 2020).

In summary, this study provides a comprehensive and user-friendly single-cell resource of the Drosophila's eye-antennal disc. It is envisioned that these computational strategies and enhancer resources will be of value not only to the Drosophila community, but also to the field of single-cell regulatory genomics in general (Bravo González-Blas, 2020).

Lineage-resolved enhancer and promoter usage during a time course of embryogenesis

Enhancers are essential drivers of cell states, yet the relationship between accessibility, regulatory activity, and in vivo lineage commitment during embryogenesis remains poorly understood. This study measured chromatin accessibility in isolated neural and mesodermal lineages across a time course of Drosophila embryogenesis. Promoters, including tissue-specific genes, are often constitutively open, even in contexts where the gene is not expressed. In contrast, the majority of distal elements have dynamic, tissue-specific accessibility. Enhancer priming appears rarely within a lineage, perhaps reflecting the speed of Drosophila embryogenesis. However, many tissue-specific enhancers are accessible in other lineages early on and become progressively closed as embryogenesis proceeds. This study demonstrates the usefulness of this tissue- and time-resolved resource to definitively identify single-cell clusters, to uncover predictive motifs, and to identify many regulators of tissue development. For one such predicted neural regulator, l(3)neo38, a loss-of-function mutant was generated and an essential role for neuromuscular junction and brain development was uncovered (Reddington, 2020).

Chromatin accessibility profiling is a powerful method to map the regulatory genome-the full complement of regulatory elements engaged by transcription factors (TFs) and other regulatory proteins-in a specific cell type and biological context. Accessible regions are generally measured by the sensitivity of genomic regions to the nuclease DNaseI, e.g., 'DNase-seq', or insertion of the Tn5 transposase, e.g., 'ATAC-seq', following the displacement of canonical nucleosomes by TFs and other regulatory factors. when measured over cellular differentiation, these approaches can chart the dynamics of regulatory-element usage genome wide, infer the occupancy of DNA-binding factors en masse, and identify candidate TF drivers of cell-fate decisions (Reddington, 2020).

Although an excellent method to identify the location of regulatory elements, the precise relationship between chromatin accessibility, enhancer activity, and the timing of cell-fate decisions remains unclear. Chromatin accessibility is often equated with regulatory activity, but there are several reasons why this may not be the case. First, the precise molecular determinants of chromatin accessibility as measured by methods, such as DNase-seq and ATAC-seq, and the relative importance of different TF families, chromatin-remodeling complexes, and histone variants are not well understood. Second, chromatin accessibility per se provides no information about the identity of the bound proteins, whether they have activating or repressive activity, or the function of the regulatory region (enhancer, insulator, or origin of replication). Third, the activation of at least a subset of lineage-specific enhancers occurs via a multistep process during development, being bound in precursor cells ('priming'), but only becoming activated at subsequent developmental stages following an exchange or addition of TFs. Fourth, chromatin accessibility may reflect residual TF binding that remains in a recently diverged sister lineage. Directly connecting accessibility to enhancer activity more generally has been hampered by technical limitations-the activity of a large number of enhancers is rarely measured directly in vivo, but is rather inferred from secondary information, such as histone modifications, that are themselves imperfect correlates of enhancer activity. Therefore, it remains unclear if the timing of enhancer activity correlates with the timing of accessibility and if this relationship changes over development (Reddington, 2020).

Chromatin accessibility profiling has been applied extensively to human cell lines and primary tissues, providing a comprehensive picture of regulatory landscapes in definitive cell types and highlighting the extensive differences between them. In contrast, far less is known about how regulatory landscapes diverge during embryonic development. With few exceptions, much of what is known comes from whole embryos or dissected organs or tissues, which have limited sensitivity to detect enhancers that are active in small subsets of cells due to averaging over heterogeneous cell types (Reddington, 2020).

Single-cell methods can address some of these issues. The power of single-cell ATAC-seq was recently demonstrated by combinatorial indexing (sci-ATAC-seq) to dissect cell-type-specific signatures of open chromatin starting from whole embryos (Cusanovich, 2018). This 'shotgun' approach identified thousands of previously unidentified regulatory elements and was sufficient to annotate cell clusters and predict tissue-specific enhancer activity (Cusanovich, 2018). However, a limitation of single-cell genomics is that cell identities are assigned post hoc based on prior information, e.g., using marker genes. Having high-quality tissue- and time-resolved datasets are therefore an extremely useful resource for annotating single-cell data. In addition, given the current sparsity of single-cell epigenomics data, bulk data have greater statistical power to define cell-type and time-point-specific regulatory features de novo (Reddington, 2020).

To better characterize chromatin accessibility during the development of different embryonic cell lineages, a tissue- and time-resolved chromatin accessibility atlas was generated spanning the stages when all major cell types are specified during Drosophila embryogenesis, focusing on the mesoderm and neuronal lineages. This atlas was integrated with an extensive database of curated embryonic enhancers to systematically assess the relationship between enhancer accessibility and enhancer activity in vivo during cell-fate decisions. The high resolution of this tissue- and time-resolved data facilitated the discovery of thousands of previously unidentified tissue-specific regulatory elements. This study demonstrated the value of this resource to define cell identities in sci-ATAC-seq data (Cusanovich, 2018) and as a training set to identify sequence features that are predictive of tissue-specific regulatory-element usage. This allowed identification of many potential regulators in each lineage, one of which was characterize by genetic knockout, and an essential role was demonstrated in neuromuscular junction and brain development (Reddington, 2020).

In this study most l(3)neo38^CRISPR mutant embryos reached the pupal stage (74%), but then died as late (black) pupae; only 12% hatched from the pupal case compared with 90% of the control heterozygote animals. The small number of l(3)neo38^CRISPRadult flies that do hatch have severely reduced motility, are barely able to walk and unable to fly (with a 'wings-up' phenotype), and fail to reproduce, suggesting defects in the neuromuscular system (Reddington, 2020).

To further characterize these defects, third instar larval brains and neuromuscular junctions were examined via immunostaining. To assess the brain, advantage was taken of a GFP reporter (3xP3-GFP) driven by binding sites for the conserved neuronal transcription factor Pax6 present in the VasaCas9 line that was used for the CRISPR deletion. In wild type (heterozygous l(3)neo38^CRISPR/+) larval brains, GFP is highly expressed in the optic lobes (OL), with lower levels in the ventral nerve cord (VNC), similar to the Pax6 homolog, ey. Loss-of-function l(3)neo38^CRISPR brains show the reverse, with almost no GFP expression in the OL, and high expression in the VNC, especially in the posterior part (Reddington, 2020).

To examine the development of the neuromuscular junction (NMJ), homozygous loss-of-function third instar larvae were dissected and stained with anti-HRP (horseradish peroxidase, to mark neuronal membranes) and Brp (Bruchpilot, to mark synaptic active zones). This revealed neuromuscular junctions that are both smaller and less branched in l(3)neo38^CRISPR mutants. Quantifying the NMJ area at the ventral longitudinal muscles 6 and 7 in abdominal segments A3 and A4, using the HRP marker protein, revealed a significant reduction in the size of NMJs in homozygous loss-of-function mutants compared with wild-type larvae of the same age. Consistent with these neurological defects, l(3)neo38 occupies a subset of neuronal specific distal DHS, as seen using embryonic ChIP-seq data . The nearest genes of these l(3)neo38 ChIP peaks are enriched in functions related to postembryonic development and the regulation of neuron interactions and synapses (Reddington, 2020).

Drosophila is a long-standing model organism for the study of gene regulation, chromatin biology, cell-fate decisions, and general properties of embryonic development, with many fundamental biological processes being first discovered in this system. This study extended the Drosophila toolkit by generating an extensive resource of tissue-resolved chromatin accessibility during an in vivo time course of tissue development. By isolating highly pure populations of cell nuclei in a tight developmental time course, this atlas reveals the extensive dynamics that accompanies major developmental transitions in two important lineages: the nervous system and the mesoderm/muscle system. Importantly, this significantly extends knowledge beyond the previously available data collected from whole embryos and single cells from a smaller number of time points (Cusanovich, 2018). This resource can be easily searched and visualized (Reddington, 2020).

There are multiple approaches that can achieve cell-type-resolved chromatin accessibility information from developing embryos, such as the INTACT, BiTS, CaTaDA, and shotgun single-cell ATAC-seq. Each has its own strengths and weaknesses, making the choice of method dependent on technical issues. An advantage of this modified BiTS approach is that it can be applied to wild-type embryos of theoretically any species, although it is limited by the availability of high-quality antibodies against nuclear marker proteins of interest. Single-cell ATAC-seq is a very exciting alternative as it negates the need for transgenic animals or antibodies to isolate cell populations, and provides an unbiased view of cellular heterogeneity (Cusanovich, 2018). However, without prior enrichment, hundreds of thousands to potentially millions of cells are required to capture rare cell types. At present the resolution is to sparse in singe-cell epigenetic data (scATAC-seq or ChIP-seq) to provide accurate quantification, or de novo detection, of enhancers from rare cell subsets-highlighting the need for detailed studies of specific tissues/cell-types to provide deep, quantitative information on regulatory regions for targeted cell populations (Reddington, 2020).

This study demonstrates the utility of this tissue- and time-resolved atlas to uncover biological insights into regulatory element usage during embryonic development. Many promoters, even those of tissue specific genes, are open constitutively, including in tissues where the gene is not expressed. In contrast, the majority of distal sites are dynamic and tissue-specific. Thousands of these overlap characterized enhancers, and are accessible in the appropriate tissue matching the enhancers' activity-strongly suggesting that the bulk of distal elements (with high developmental variance) are new tissue-specific enhancers. Similar to promoters, this study also uncovered a subset of enhancers that are open in lineages where they are not active at early embryonic stages, and this gets progressively lower as embryogenesis proceeds. This indicates that active enhancers become closed in other lineages as cells begin to differentiate. Interestingly, the majority of these are bound by ubiquitous TFs, which suggests that they are not sufficient to activate the enhancer. Perhaps they boost the levels of the enhancers activity in the 'right' tissue, when co-bound by lineage specific TFs or act as place holders to keep these enhancers open until the embryonic stage when the lineage-specific factor is expressed (Reddington, 2020).

Ultraconserved non-coding DNA within Diptera and Hymenoptera

This study has taken advantage of the availability of the assembled genomic sequence of flies, mosquitos, ants and bees to explore the presence of ultraconserved sequence elements in these phylogenetic groups. Non-coding sequences found within and flanking Drosophila developmental genes were compared to homologous sequences in Ceratitis capitata and Musca domestica. Many of the conserved sequence blocks (CSBs) that constitute Drosophila cis-regulatory DNA, recognized by EvoPrinter alignment protocols, are also conserved in Ceratitis and Musca. Also conserved is the position but not necessarily the orientation of many of these ultraconserved CSBs (uCSBs) with respect to flanking genes. Using the mosquito EvoPrint algorithm, uCSBs shared among distantly related mosquito species were identified. Side by side comparison of bee and ant EvoPrints of selected developmental genes identify uCSBs shared between these two Hymenoptera, as well as less conserved CSBs in either one or the other taxon but not in both. Analysis of uCSBs in these dipterans and Hymenoptera will lead to a greater understanding of their evolutionary origin and function of their conserved non-coding sequences and aid in discovery of core elements of enhancers (Brody, 2020).

Phylogenetic footprinting of Drosophila genomic DNA has revealed that cis-regulatory enhancers can be distinguished from other essential gene regions based on their characteristic pattern of conserved sequences. Cross-species alignments have also identified conserved non-coding sequence elements associated with vertebrate developmental genes, and sequences that are conserved among ancient and modern vertebrates (e.g., the sea lamprey and mammals). Elements conserved between disparate taxa are considered to be 'ultraconserved elements'. Previous studies have identified ultra-conserved elements in dipterans, Drosophila species and sepsids and mosquitos. Comparison of consensus transcription factor binding sites in the spider Cupiennius salei and the beetle Tribolium castaneum have been shown to be functional in transgenic Drosophila (Brody, 2020).

This study describes sequence conservation of non-coding sequences within and flanking developmentally important genes in the medfly Ceratitis capitata, the house fly Musca domestica and Drosophila genomic sequences (see Genomic regions analyzed for presence of uCSBs). The house fly and Medfly have each diverged from Drosophila for ~100 and ~120 My respectively. This analysis reveals that, in many cases, CSBs that are highly conserved in Drosophila species, as detected using the Drosophila EvoPrinter algorithm, are also conserved in Ceratitis and Musca. Additionally, the linear order of these ultraconserved CSBs (uCSBs) with respect to flanking structural genes is also maintained. However, a subset of the uCSBs exhibits inverted orientation relative to the Drosophila sequence, suggesting that while enhancer location is conserved, their orientation relative to flanking genes is not (Brody, 2020).

For detection of conserved sequences in mosquitos, EvoPrinter algorithms were adapted to include 22 species of Anopheles plus Culex pipens and Aedes aegypti. Use of Anopheles species allows for the resolution of CSB clusters that resemble those of Drosophila. Comparison of Anopheles with Culex and Aedes, separated by ∼150 million years of evolutionary divergence, reveals uCSBs shared among these taxa. Although mosquitoes are considered to be Dipterans, uCSBs were identified conserved between mosquito species but these were generally not found in flies (Brody, 2020).

In addition, EvoPrinter tools were developed for sequence analysis of seven bee and thirteen ant species. Both ants and bees belong to the Hymenoptera order and have been separated by ~170 million years. Within the bees, Megachile and Dufourea are sufficiently removed from Apis and Bombus (~100 My) that only portions of CSBs are shared between species: these can be considered to be uCSBs. uCSBs are found that are shared between ant and bee species, and these are positionally conserved with respect to their associated structural genes. Finally, this study shows that ant specific and bee specific CSB clusters that are not shared between the two taxa are in fact interspersed between shared uCSBs (Brody, 2020).

A previous study of 19 consecutive in vivo tested Drosophila enhancers, contained within a 28.9 kb intragenic region located between the vvl and Prat2 genes, revealed that each CSB cluster functioned independently as a spatial/temporal cis-regulatory enhancer (Kundu, 2013). Submission of this enhancer field to the RefSeq Genome Database of Ceratitis capitata via BLASTn revealed 17 uCSBs; all 17 regions were colinear and located between the Ceratitis orthologs of Drosophila vvl and Prat2 genes. In each case the matches between Ceratitis and Drosophila corresponded to either a complete or a portion of a CSB identified by the Drosophila EvoPrinter as being highly conserved among Drosophila species (Kundu, 2013). Submission of the same Drosophila region to Musca domestica RefSeq Genome Database using BLASTn revealed 13 uCSBs that were colinearly arrayed within the Musca genome. Nine of these Ceratitis and Musca CSBs were present in both species and corresponded to CSBs contained in several of the enhancers identified in a previous study of the Drosophila enhancer field (Kundu, 2013). The conservation within one of these embryonic neuroblast enhancers, vvl-41, is shown in the following figure (Ultra-conserved sequences shared among a Drosophila ventral veins lacking enhancer and orthologous DNA within the Ceratitis capitata and Musca domestica genomes.). Each of the CSB elements in vvl-41 that are shared between Dm and Ceratitis are in the same orientation with respect to the vvl structural gene. Three-way alignments of each of the other eight uCSBs within the vvl enhancer field that are shared between Dm, Ceratitis and Musca are shown in a supplemental figure. The uCSB of vvl-49 in Ceratitis is in reverse orientation with respect to the vvl structural gene. Many of the uCSBs in Musca are in a different orientation on the contig than in Dm, indicating microinversions. One of the two uCSBs in Ceratitis goosecoid was in reverse orientation compared to Drosophila CSBs, while three of the four uCSBs in Musca goosecoid were in reverse orientation. One uCSB each in Ceratitis and Musca castor was in reverse orientation compared to Drosophila castor. 10 of the 15 uCSBs in the Musca wingless non-coding region were in the reverse orientation compared to the orientation in Drosophila, while all uCSBs in Ceratitis Dscam2 were in forward orientation compared to the orientation in Drosophila. It is concluded that, except for microinversions, the order and orientation is the same, with respect to flanking genes of highly conserved non-coding sequences in select developmental determinants of Drosophila, Ceratitis and Musca (Brody, 2020).

Many of the non-coding regions in dipteran genomes contain uCSBs, especially in and around developmental determinants, and many of these are likely to be cis-regulatory elements such as those found in the vvl enhancer field. Another example is the prevalence of uCSBs found in the non-coding sequences associated the Dm hth gene locus. A previous study identified an ultraconserved region in hth shared between Drosophila and Anopheles. This study has identified additional hth uCSBs shared among Dm, Ceratitis and Musca. A 55,100 bp upstream region of Dm hth terminating just after the start of the first exon. A total of 11 CSBs shared between the three species, 5 CSBs were shared between Dm and Ceratitis but not Musca, and 6 CSBs were shared between Dm and Musca, but not Ceratitis. Ceratitis exhibited 4 uCSBs and Musca exhibited 8 uCSBs that were in reversed orientation with respect to the Drosophila orthologous regions. Additional genes analyzed in this paper were also analyzed for association with uCSBs in Ceratitis and Musca, and these results are summarized in the table. In some cases, for example wingless in Ceratitis, the presence of uCSBs could not be verified because of the incomplete assembly of the genome, leaving coding sequences and uCSBs on different contigs. In another case, Dscam2 in Musca, no uCSBs were identified (Brody, 2020).

EvoPrint analysis of Drosophila hth sequences immediately upstream and including the first exon, revealed a conserved sequence cluster associated with the transcriptional start site. Two of the longer CSBs were conserved in both Ceratitis and Musca, one shorter CSB was conserved only in Musca, and a second shorter CSB was conserved only in Ceratitis. Each of the uCSBs was in the same orientation with respect to the hth structural gene (Brody, 2020).

EvoPrinting combinations of species using A. gambiae as a reference species and multiple species from the Neocellia and Myzomyia series and the Neomyzomyia provides a sufficient evolutionary distance from A. gambiae to resolve CSBs. Phylogenic analysis has revealed the Anopheles species diverged from ~48 My to ~30 My while Aedes and Culex diversified from the Anopheles lineage in the Jurassic era or even earlier (Brody, 2020).

This study sought to identify uCSBs in selected mosquito developmental genes by comparing Anopheles species with Aedes and Culex. Non-coding sequences associated with the mosquito homolog of the morphogen wingless were examined to discover associated conserved non-coding sequences. A CSB cluster slightly more than 27,000 bp upstream of the A. gambiae wingless coding exons is shown (EvoPrint analysis of the intragenic region adjacent to the Anopheles Wnt-4 and wingless genes identifies ultra-conserved sequences shared with the evolutionary distant Culex pipiens and Aedes aegypti genomes). CSB orientation in A. gambiae was reversed with respect to the ORF when compared to the orentations of both Culex and Aedes CSBs. It is noteworthy that this EvoPrint, carried out using multiple Anopheles, consists of a cluster of CSBs, resembling EvoPrints carried out using Drosophila species. This general pattern of CSB clusters separated by poorly conserved 'spacers' is prevalent among other developmental determinants in mosquitos. uCSBs, conserved in Culex and Aedes, coincide with CSBs revealed by EvoPrint analysis of Anopheles non-coding sequences. A supplemental figure illustrates an EvoPrinter scorecard for the non-coding wingless-associated CSB cluster described in the above figure. Scores for the first four species, all members of the gambiae complex, are similar to that of A. gambiae against itself, with subsequent scores reflecting increased divergence from A. gambiae. Culex and Aedes are distinguished from the other species by their belonging to a distinctive branch of the mosquito evolutionary tree, the Culicinae subfamily and their low scores against the A. gambiae input sequence. No uCSBs were detected associated with gbb or gsc, while uCSBs were readily detected associated with vvl, cas and hth. A single uCSB in Aedes cas and two uCSBs in Culex cas exhibited a reverse configuration compared to the uCSBs in Anopheles. One uCSB in Culex vvl and no uCSBs in Aedes vvl exhibited a reverse configuration compared to the uCSB in Anopheles. Finally, all uCSBs in Culex and Aedes hth were in forward orientation compared to Anopheles. None of the uCSBs shared between Drosophila, Ceratitis and Musca were conserved in mosquitos, with the exception of a single uCSB associated with a 3'UTR (CTTCGTTTTTGCAAGAGGCCCATATAGCTCGCCAA) that is fully conserved in the Dipteran species tested, A possible explanation for this lack of conservation is the observation that mosquitos are only distantly related to Diptera (Brody, 2020).

Bees and ants are members of the Hymenoptera Order, representing the Apoidea (bee) and Vespoidea (ant) super-families. Current estimates suggest that the two families have evolved separately for over 100 million years. To identify conserved sequences either shared by bees and ants or unique to each family, EvoPrinter alignment tools were developed for seven bee and 13 ant species and searched for CSBs that flank developmental determinants. Three approaches were employed to identify/confirm conserved elements and their positioning within bee and ant orthologous DNAs. First, EvoPrinter analysis of bee and ant genes identified conserved sequences in either bees or ants and ultra-conserved sequence elements shared by both families. Second, BLASTn alignments of the orthologous DNAs identified/confirmed CSBs that were either bee or ant specific or shared by both. Third, side-by-side comparisons of ant and bee EvoPrints and BLASTn comparisons revealed similar positioning of orthologous CSBs relative to conserved exons (Brody, 2020).

To identify conserved sequences within bee species EvoPrints of the honey bee (Apis mellifera) genes were generated using other Apis and Bombus species. Using EvoPrints of the Dscam2 locus, clusters of conserved sequences were resolved. Dscam2 is implicated in axon guidance in Drosophila and in regulation of social immunity behavior in honeybees. The EvoPrint scorecard revealed a high score (close relationship) with the homologous region in the other two Apis species. The more distant Bombus species score lower by greater than 50%, and Habropoda represents a step down from the more closely related Bombus species. Megachile shows a significantly lower score reflecting its more distant relationship to Apis mellifera. The relaxed EvoPrint readout reveals two CSB clusters. Only one sequence cluster, the lower 3' cluster, is conserved in all six test species examined, while the 5' cluster is present in all species except Megachile. BLAST searches confirmed that the 3' cluster was absent from Megachile, a more distant species Dufourea novaeangliae, and all ant species in the RefSeq genome database. BLASTn alignments also revealed conservation of the 3' cluster in D. novaeangliae, the wasp species Polistes canadensis and two ant species, Vollenhavia emeryi and Dinoponera quadriceps (Brody, 2020).

EvoPrinter analysis of bee and ant genes that are orthologs of Drosophila neural development genes goosecoid (gsc) and castor (cas) revealed conserved non-coding DNA that is unique to either bees or ants or conserved in both. EvoPrints of the Hymenoptera orthologs identify non-coding conserved sequence clusters that contained core uCSBs shared by both ant and bee superfamilies, and these uCSBs are frequently flanked by family-specific conserved clusters. For example, analysis of the non-coding sequence upstream of the Wasmannia auropunctata (ant) cas first exon identifies both a conserved sequence cluster that contains ant and bee uCSBs and an ant specific conserved cluster that has no counterpart found in bees. It is likely that the ant specific cluster was deleted in bees, since BLASTn searches of Wasmannia against the European paper wasp Polistes dominula reveals conservation of a core sequence corresponding to this cluster. The combined evolutionary divergence in the gsc and cas EvoPrints, accomplished by use of multiple test species, reveals that many of the amino acid codon specificity positions are conserved while wobble positions in their ORFs are not. The lack of wobble conservation indicates that the combined divergence of the test species used to generate the prints afford near base pair resolution of essential DNA (Brody, 2020).

Cross-group/side-by-side bee and ant comparison of their conserved DNA was performed using bee specific and ant specific EvoPrints and by BLASTn alignments (see Side-by-side comparison of conserved sequences within the bee and ant glass bottom boat loci identify clusters of conserved and species-specific sequences.). This figure highlights the conservation observed among bee and ant exons and flanking sequence of the glass bottom boat (gbb, 60A) locus of Apis melliflera EvoPrinted with four bee test species and the Wasmannia auropunctata gbb locus EvoPrinted with three ant species. Position and orientation of these CSB clusters and uCSBs is conserved. Similarly, EvoPrinting a single exon and flanking regions of the Apis mellifera homothorax locus with four bee species and generating an ant specific EvoPrint of the orthologous ant sequence of the Ooceraea biroi homothorax locus with ten other ant species, reveals CSBs that are conserved in both Apis and Ooceraea, as well as sequences that are restricted to one of the two Hymenopteran families (Brody, 2020).

This study describes the use of EvoPrinter to detect the presence of ultraconserved non-coding sequences in flies, including Drosophila species, Ceratitis and Musca, in mosquitos and in Hymenoptera species. uCSBs of the three fly taxa have, for the most part, maintained their linear order suggesting a functional constraint on the order of regulatory sequences. For mosquitos, an older taxon than that of flies and the Hymenoptera, uCSBs are found to be shared between Anopheles, Culex and Aedes. Importantly, in Hymenoptera, uCSBs were found within clusters of conserved sequences shared between ants and bees. This conservation of core sequences in enhancers suggests that these morphologically divergent taxa share common regulatory networks. These approaches to detection of uCSBs in flies, mosquitos and ants and bees will lead to a greater understanding of their evolutionary origin and the function of their conserved non-coding sequences. Knowledge of clusters of CSBs and of uCSBs is an important tool for discovery of the core elements of enhancers and their sequence extent (Brody, 2020).

In most cases both nBLAST and the EvoPrinter algorithm had similar sensitivities and gave comparable results. However, it is recommended that the two techniques should be used in conjunction with one another to enhance CSB and uCSB detection. For example, by using both approaches, uCSBs were discovered that were identified by one tool but not both. The advantage of EvoPrinter is the presentation of an interspecies comparison as a single sequence, while the advantage of nBLAST is that it provides a sensitive detection of sequence homology in a one-on-one alignment. EMBOSSED Needle alignment gives an even more sensitive detection of shorter sequences and is of use once BLAT or EvoPrinter has been used to discover shared CSBs and/or CSB clusters (Brody, 2020).

Transcriptional silencers in Drosophila serve a dual role as transcriptional enhancers in alternate cellular contexts

A major challenge in biology is to understand how complex gene expression patterns are encoded in the genome. While transcriptional enhancers have been studied extensively, few transcriptional silencers have been identified, and they remain poorly understood. This study used a novel strategy to screen hundreds of sequences for tissue-specific silencer activity in whole Drosophila embryos. Almost all of the transcriptional silencers that were identified were also active enhancers in other cellular contexts. These elements are bound by more transcription factors than non-silencers. A subset of these silencers forms long-range contacts with promoters. Deletion of a silencer caused derepression of its target gene. These results challenge the common practice of treating enhancers and silencers as separate classes of regulatory elements and suggest the possibility that thousands or more bifunctional CRMs remain to be discovered in Drosophila and in humans (Gisselbrecht, 2019).

This study has adapted an enhancer-fluorescence-activated cell sorting-sequencing (FACS-seq) technology for highly parallel screening of elements for enhancer activity in Drosophila embryos (Gisselbrecht, 2013) into silencer-FACS-seq (sFS) technology, which enriches for elements that tissue specifically silence reporter gene expression. Briefly, this study generated a reporter vector, pSFSdist, which drives GFP expression under the control of an element from a library of candidate silencers, positioned at least 100 bp upstream of a strong, ubiquitous enhancer (ChIPCRM2078; Gisselbrecht, 2013). This vector contains a target sequence for a site-specific recombinase, permitting assaying all of the tested elements in the same genomically integrated context. Flies carrying single insertions from the reporter library are crossed to a strain in which the expression of the exogenous marker protein rat CD2 is driven in a tissue or cell type of interest, and the resulting informative embryos are dissociated to produce a single-cell suspension. By sorting for CD2+ cells in which GFP expression is reduced from the level driven by the strong ubiquitous enhancer in the absence of silencing activity, cells were enriched for that contained silencers active in the cell type of interest, which were then recover and identify by high-throughput sequencing. Insertion of an element with known mesodermal silencing activity into this vector consistently yielded a larger fraction of CD2+GFPreduced cells) than were observed when a negative control sequence (derived from Escherichia coli genomic DNA) was used (Gisselbrecht, 2019).

A library of 591 genomic elements, chosen to represent a variety of chromatin states or enhancer activity patterns, was designed to test for silencer activity in the embryonic mesoderm at stages 11-12 (5.5-7.5 h after embryo deposition). Since general features of silencers are unknown, this library was designed to test 3 main hypotheses about what kinds of sequences act as silencers in this developmental context (Gisselbrecht, 2019).

First, it was noted that 2 bifunctional CRMs had been identified previously in Drosophila that function as enhancers in one context and as silencers in other contexts. As this phenomenon is known to occur in multiple eukaryotic systems from a small number of examples and could be important in understanding the architecture of regulatory DNA, it was desirable to assess the generality of this phenomenon. Therefore, CRMs were selected from the REDfly and CAD2 databases that exhibited no or highly restricted mesodermal expression at embryonic stage 11. Elements associated with genes that show widespread mesodermal expression at this stage were filtered out. Second, a potential mechanistic signature of transcriptional silencers is the binding of well-characterized transcriptional corepressors, by analogy to the prediction of enhancers by binding of the coactivator CBP. Therefore genomic elements identified by ChIP as binding sites for the co-repressors Groucho or CtBP were selected. As Groucho has canonically been associated with long-range repression and CtBP has been associated with short-range repression, it was predicted that Groucho binding sites would be a richer source of silencer activity in this assay, in which candidate silencers were placed >100 bp upstream of the enhancer driving reporter gene activity (Gisselbrecht, 2019).

Third, genomic regions were selected that were associated with the markers of both enhancers and repressed chromatin structure in whole-mesoderm or whole-embryo experiments. It was reasoned that silencers are active regulatory elements, distinct from the silenced chromatin that results from their activity, yet must recruit factors that exert repressive functions, and therefore may show association with both classes of chromatin marks. Moreover, these 'bivalent' chromatin states may represent sequences of the above-mentioned type, that act as enhancers in one cell type and as silencers in another. In this class, two sets of sequences were included: (1) DNase I hypersensitive sites (DHSs) that colocalize with ChIP signal for the well-studied repressive chromatin mark histone H3 trimethyllysine 27 (H3K27me3) in sorted mesoderm, and (2) coincident mesodermal peaks for H3K27me3, the canonical enhancer mark histone H3 monomethyllysine 4 (H3K4me1), and histone H3 acetyllysine 27 (H3K27ac), which has been associated with active enhancers and promoters. All of the sequences identified from genome-wide ChIP methods were filtered for the absence of widespread mesodermal expression of associated nearby genes (Gisselbrecht, 2019).

Finally, 15 sequences were included for which enhancer chromosomal contact sequencing (4C-seq) data for sorted mesodermal cells were available. Three positive control sequences were included that were previously shown to have mesodermal silencing activity and two types of putative negative controls were included: broadly active mesodermal enhancers and length-matched regions of the E. coli genomic sequence (Gisselbrecht, 2019).

The library of genomic elements was screened for silencer activity in the embryonic mesoderm in 2 rounds. Testing of this library yielded a readily detectable population of mesodermal cells in which GFP expression was reduced; the elements enriched in these cells are referred to as sFS+ elements. Of the 591 sequence elements chosen for inclusion in this library, 501 were genomically integrated into transgenic flies after injection of the pooled library (Gisselbrecht, 2019).

Overlap with transcription start sites (TSSs) were identified as a highly enriched feature of sFS+ elements, which likely reveals the presence of promoter competition. Competition among promoters for association with active enhancers is one mechanism that has been proposed to account for the specificity with which enhancers target genes for activation and has been shown to restrict enhancer-driven activation of gene expression in reporter assays. Overall, the initial set of 41 'hits' that overlapped promoter regions was significantly enriched for mapped instances of the TATA box. While these are technical positives in the silencer screen, since the goal of this study was to analyze CRMs that silence gene expression by other means, any sequences that overlapped promoter regions were omitted from subsequent analyses. Moreover, many of the library elements tested overlapped other library elements; these were mergend for downstream analysis. After filtering to remove elements that overlapped promoters and collapsing overlapping genomic regions, 29 of a total of 352 genomic regions tested for mesodermal silencer activity were positive in the sFS screen (Gisselbrecht, 2019).

To validate the results from the sFS screen, pure transgenic lines were generated from a subset of library elements, and then their silencer activity was assayed by FACS analysis of embryos resulting from these individual reporter strains. Next, whether the silencers detected by the sFS screen could also silence the activity of enhancers other than the strong, ubiquitous enhancer used in the sFS screen was investigated. Thus, silencing by several sFS+ elements was assessed visually by placing these elements upstream of the following mesoderm-specific enhancers and imaging the resulting GFP expression. ChIPCRM2613 is an intronic enhancer of the pan-mesodermal gene heartless, and it drives reporter gene expression throughout the presumptive mesoderm from the beginning of gastrulation (Gisselbrecht, 2013). The Mef2 I-ED5 enhancer drives expression in the fusion-competent myoblasts beginning in late stage 11. In every case examined (15 of 15), at least one of these additional enhancers showed reduced activity in the mesoderm in the context of the tested element. These results not only verify the silencing activity of these sFS+ elements but they also demonstrate that silencers are not specific for a particular enhancer (Gisselbrecht, 2019).

The resulting set of mesodermal silencers was analyzed to determine which genomic features that were explicitly sampled in the design of the element library were predictive of silencer activity. Despite the inclusion in the sFS library of ChIP peaks for transcriptional co-repressors and for a repressive chromatin mark, the only screened element types that were significantly enriched among the active mesodermal silencers were positive controls and non-mesodermal enhancers. In fact, 22 of 29 regions containing mesodermal silencers had been previously reported to have enhancer activity. Testing of the remaining 7 silencers for enhancer activity revealed that 6 of 7 also function as enhancers in the embryo: 5 of 6 were entirely non-mesodermal, while 1 showed restricted mesodermal expression. In total, 28 of 29 of the elements were found to act as mesodermal silencers also exhibited enhancer activity in a different cellular context. Overall, >10% (26 of 200) of the previously known enhancers tested in the assay exhibited mesodermal silencer activity. This constitutes more bifunctional transcriptional regulatory elements than were previously known across all biological systems (Gisselbrecht, 2019).

A different class of bifunctional CRMs was reported recently (Erceg, 2017), in which developmental enhancers have an additional function as Polycomb response elements (PREs). PREs provide genomic binding sites for sequence-specific DNA binding proteins that recruit protein subunits of Polycomb repressive complexes and could, in principle, play a role in silencing target genes. Therefore, this study tested the hypothesis that the silencer activity of enhancers discovered in the current assay resulted from PRE activity. Only 4 of 29 regions displaying mesodermal silencer activity overlapped PREs as defined by Erceg (2017) on the basis of ChIP for the PRE-binding factors Pho and dSfmbt, versus 24 of 323 mesodermal non-silencers, indicating that PREs are not a major source of mesodermal silencers (Gisselbrecht, 2019).

The results suggest a view of enhancers as CRMs with distinct spatiotemporal patterns of both activation and repression. To further assess the generality of this phenomenon, the effects of a subset of the newly discovered mesodermal silencers was visualized on enhancers that are active broadly in the mesoderm at different developmental stages. This enabled simultaneous evaluation of a variety of spatiotemporal domains of silencer and enhancer activity (Gisselbrecht, 2019).

Several elements exhibited apparently uniform silencing activity across the mesoderm and at different stages. The most commonly observed temporal pattern was a lack of silencing activity at gastrulation and strong silencing during the later stages at which sFS was performed. An element, hkb_0.6kbRIRV, was also observed that silenced much more strongly earlier than later in embryonic development; this element simultaneously acted as an enhancer in its previously characterized pattern in the midgut primordia. One element, the oc otd186 enhancer, which was observed to drive expression in the head, exhibited spatially patterned silencing within the mesoderm during gastrulation but not later in development; silencing was moderate in the anterior portion of the germband, but much stronger in the posterior portion, as seen in the context of 2 different early pan-mesodermal enhancers. Two different later-acting mesodermal enhancers showed moderate, uniform silencing across the anteroposterior extent of the germband. Finally, a tested element exhibited enhancer-specific silencing activity. The lz crystal cell enhancer was a moderately weak silencer when tested on different late-acting mesodermal enhancers and a strong silencer at gastrulation in the context of ChIPCRM2613, yet it completely failed to silence activity driven during gastrulation by ChIPCRM7759. These results highlight that silencers exhibit a similarly diverse range of spatiotemporal regulatory patterns as those of enhancers (Gisselbrecht, 2019).

To investigate whether the silencer activity observed in reporter assays reflects the activity of the putative silencer in its native chromosomal context, the average mesodermal transcription was profiled in the genomic neighborhood of silencers or other functional elements. Using published RNA-seq data from sorted mesodermal cells, reads were aggregated within 500-bp windows over a 25-kb region centered on each element, representing the typical size of chromatin state domains observed in a high-resolution Drosophila Hi-C experiment. Then the results were averaged over all of the elements in a class to create a meta-profile of transcript levels surrounding each class of cis-regulatory elements. As expected, transcription near a previously published set of mesodermal enhancers (Gisselbrecht, 2013) is elevated. In contrast, transcription near silencers is below the baseline level of transcription observed near a negative control set of genomic regions. Both effects decay to background levels in the meta-profiles over a scale of ~5 kb, suggesting that silencers act within approximately the same distance range as transcriptional enhancers. The bifunctional PREs reported by Erceg (2017) are also associated with strongly elevated transcription, but this effect appears to spread more broadly on the chromatin domain scale, suggesting that bifunctional PREs may act by a mechanism that is distinct from that of the silencers identified in the sFS screen (Gisselbrecht, 2019).

Next, to further demonstrate the functional importance of the silencers that were identified in the sFS screen, CRISPR-Cas9 genome editing was used to generate a Drosophila strain containing a deletion of the hkb_0.6RIRV element. This element was originally reported as an enhancer driving the expression of the gap gene huckebein (hkb) at the termini of the blastoderm embryo, and this study identified it as a mesodermal silencer in the screen. Mesodermal cells from embryos homozygous for this deletion and from wild-type control embryos were screened and hkb RNA was found to be significantly upregulated in the homozygous mutant mesoderm, which supports a role for this element in silencing its endogenous target gene during normal embryonic development (Gisselbrecht, 2019).

Various types of epigenomic features, including chromatin accessibility, post-translational modifications of histones, and occupancy by TFs and chromatin-modifying enzymes, have been associated with different categories of functional elements in the genome, such as active promoters and enhancers. However, relatively little is known about the chromatin features of active silencers. Therefore the epigenomic environment at the set of 29 mesodermal silencers was explored by assessing the enrichment or depletion of signal from various published epigenomic datasets, as compared to elements that did not display silencer activity in the sFS screen (Gisselbrecht, 2019).

It was hypothesized that since bifunctional elements are more functionally complex than CRMs that act only as enhancers, they may exhibit a more complex suite of TF interactions across various tissues. This study observed that validated silencers are strongly enriched for overlap with highly occupied target (HOT) regions as exceeding a TF complexity score threshold of ~10 overlapping bound factors. Since silencing activity is likely mediated through the effects of bound sequence-specific transcriptional repressors, the set of 29 mesodermal silencers was searched for enriched combinations of evolutionarily conserved DNA binding site motif occurrences for TFs annotated as repressors. The only motif combination that was significantly enriched among silencers was a 3-way combination of the motifs for the TFs Snail, Dorsal, and Tramtrack-PF, which were found together in 12 of 29 mesodermal silencers (Gisselbrecht, 2019).

Snail is a well-known repressor of non-mesodermal genes in the developing mesoderm. Dorsal and Tramtrack have also been shown to play roles in mesodermal gene repression. Analysis of ChIP data for Snail revealed significant enrichment for Snail occupancy at silencers. To validate that this enrichment reflects Snail activity at silencers, predicted Snail binding sites were mutated in 4 silencer elements with high levels of Snail ChIP signal, and the silencer activity of the mutant was compared to wild-type sequences within whole embryos in the FACS-based reporter assay. All 4 elements showed significantly reduced silencer activity. Mutating sites for an unrelated TF, as a negative control, caused no significant reduction in silencer activity. While Snail has been well characterized as a short-range repressor acting within 150 bp, all of the Snail binding sites that were found to be required for full silencer activity are >400 bp away from the silenced enhancer in the reporter construct, indicating that Snail can act as a repressor at distances longer than those described for short-range repression (Gisselbrecht, 2019).

Finally, the evidence was examined for the direct action of Snail binding to silencers on the expression of the silencers' endogenous target genes. Since the majority of elements exhibiting silencer activity in this study were originally reported as enhancers, it was possible to identify published target genes of these bifunctional elements and examine the effect of loss of snail function on their expression. Target genes of 12 elements bound by Snail in ChIP-seq data showed significant derepression in sna mutant embryos. In contrast, no target of any of the 14 Snail-unbound elements was significantly derepressed. Therefore, it is concluded that the known role of Snail in mesodermal repression explains the activity of a large minority (41%) of the observed silencers, while the transcriptional regulators mediating silencing activity through the majority of the silencers remain to be determined (Gisselbrecht, 2019).

In an attempt to identify a 'silencer signature' that is analogous to the previously described chromatin signatures of enhancers and promoters, published ChIP data was assembled from whole embryos or from sorted mesoderm, where available, for several chromatin marks previously associated with active or repressed chromatin states, and hierarchical clustering was performed of all 352 tested genomic regions according to these histone modification chromatin profiles. As expected, clusters of elements with greater signal for the repressive marks H3K27me3 and H3K9me3 are enriched for silencers, while other clusters are depleted of silencers. It was also observed, however, that many non-silencers belonged to these clusters, and that some silencers belonged to other clusters that were instead enriched for non-silencers, suggesting that these commonly profiled chromatin marks do not constitute a general chromatin signature of silencers. Similarly, neither the Groucho nor the CtBP co-repressors were significantly enriched at silencers (Gisselbrecht, 2019).

For a subset of silencers (18 of 29), the individual FACS validation data provided a measure of the strength of silencer activity, in terms of the percentage of cells in the GFPreduced population. Using these quantitative estimates of silencer activity, it was found that H3K27me3 and H3K9ac, a mark that previously had been associated with bivalent promoters and active enhancers, are significantly correlated with silencer strength across these 18 elements, possibly reflecting the fact that mesodermal silencers are active enhancers in other cellular contexts. No single mark or combination of marks that was tested from among the publicly available histone modification profiles accurately discriminates active silencers as a whole from other types of cis-elements (Gisselbrecht, 2019).

It has been well established that enhancers can act directly on their target promoters by looping to create direct, 3-dimensional (3D) physical contacts between genomic elements widely separated in sequence space. Such contacts have also been shown to play a role in repression by heterochromatin and at PREs. Silencers could, in principle, act directly to recruit repressive activities to regulated promoters, or alternatively by sequestering enhancers that would otherwise interact with promoters, or by other mechanisms that do not involve focal contacts to regulated elements, such as nucleating a repressive chromatin state that spreads along the chromosome (Gisselbrecht, 2019).

Therefore data was examined from assays of genomic contacts based on proximity ligation to attempt to distinguish among these hypotheses. Mesodermal silencer activity was observed in a CRM previously characterized by circular chromosome conformation capture (4C). This element makes mesodermally enriched contacts with 2 regions that overlap the promoters of genes that are not expressed in the mesoderm, suggesting the possibility that silencing may be mediated by direct silencer-promoter looping (Gisselbrecht, 2019).

To test the generality of this potential mechanism, Hi-C data was generated from sorted mesodermal and non-mesodermal cells at embryonic stages 11-12, the same developmental stages that was assayed by sFS. Mesodermally enriched contacts were observed at 1-kb resolution in each of 2 paired replicates using the chromoR package and compared the results to the mesodermally enriched 4C contacts previously reported. The frequency of random contacts observed in Hi-C assays is greater with closer genomic distance. Therefore, to control for such nonspecific interactions in this analysis, negative control sets of 'contact regions' were generated by reflecting each observed contact around the viewpoint region. Each Hi-C replicate showed significantly greater overlap with 4C contacts than with negative control regions, indicating that the Hi-C contacts agree with published genomic contacts (Gisselbrecht, 2019).

The features of these mesodermally enriched silencer contacts were examined, as these are potential targets of silencing activity. A list was created of these potential targets by filtering for contacts that were observed in both of the Hi-C replicates and that overlap sFS-tested elements, and then the (epi)genomic features of these regions were examined. Since the TF Snail (Sna) has previously been associated with short-range repression and 'antilooping', the features of regions contacted by Sna-bound mesodermal silencers, Sna-unbound mesodermal silencers, and elements that did not act as silencers in mesoderm were compared. Regions that made mesoderm-specific contacts to Sna-unbound mesodermal silencers are significantly enriched for overlapping TSSs, as compared to those contacting either Sna-bound silencers or non-silencers, indicating that the Sna-unbound mesodermal silencers contact promoters. This suggests that Snail-unbound silencers are targeted primarily by long-range repressors, whereas Snail-bound silencers show almost no promoter contacts, which is consistent with previous reports of short-range repression and antilooping associated with this TF (Gisselbrecht, 2019).

Next, whether the expression levels of silencer-contacted genes were consistent with silencer activity was inspected. For genes whose promoters made contact with any of the 352 tested library elements, RNA-seq data from sorted mesodermal cells was compared with the elements' histone marks that showed significant correlation with silencer activity. The level of H3K27me3 found at a library element was significantly anticorrelated with the mesodermal expression of genes contacted by that element, supporting the model of contact-based repression by silencers (Gisselbrecht, 2019).

Attempts were made to test the alternate model that silencers directly contact enhancers that would otherwise be active. Thus, the contacted regions were examined for overlap with CRMs that have been reported to act as enhancers, according to the REDfly database. Each of the 3 sets of library elements (Sna-bound silencers, Sna-unbound silencers, and non-silencers) were separately tested, and no significant enrichment or depletion of CRM contact was observed in any of the sets. It was further reasoned that direct action by silencers on enhancers would result in enrichment of the enhancer mark H3K4me1 in regions that contact silencers. In this scenario, this enrichment should be apparent in histone mark data from whole embryos and across a broad range of time points, reflecting their enhancer activity in non-mesodermal tissues and/or other developmental stages. Instead a significant depletion of H3K4me1 was observed at Sna-unbound silencer contacts versus non-silencer contacts, which does not support the model of silencers interacting directly with enhancers. The results support models in which distinct classes of transcriptional silencers act by antilooping or by acting directly on the promoters of repressed genes. While the existence of silencers that may sequester enhancers from contacting promoters cannot be ruled out, the results do not support this alternate mechanism (Gisselbrecht, 2019).

This study has developed a highly parallel reporter assay carried out in whole, developing animals to identify a set of transcriptional silencers on the basis of their tissue-specific function. Analysis of RNA-seq data indicated that genes located near these silencers in their endogenous context are expressed at lower levels. Deletion of 1 of these elements at its native genomic locus by CRISPR-Cas9 genome editing demonstrated the importance of that element for the proper expression level of its target gene. This study also integrated a wide variety of data types from previously published datasets, including ChIP of histone modifications and specific factors, with newly generated tissue-specific 3D chromosomal physical interaction data to assess enriched features of the set of tissue-specific silencers and has explored potential mechanisms (Gisselbrecht, 2019).

Many enhancers were found in fact to be bifunctional elements, capable of up- and downregulating gene expression in different cellular contexts. While this phenomenon has been observed previously in studies of individual regulatory elements, the extent of CRM bifunctionality had not been appreciated. It is important to note that many CRMs that failed to show silencing activity in the screen are known enhancers that are not active in the tissue tested. Silencers are therefore an identifiable set of active elements, distinct from 'quenched' or inactive enhancers that neither activate nor repress gene expression (Gisselbrecht, 2019).

While prior studies have found histone modifications associated with enhancer activity, this study suggests that despite the extensive genome-scale ChIP profiling studies by numerous investigators and consortia, the available chromatin profiling data are not sufficient to identify silencers. This is possibly explained by the existence of various silencer classes. Alternatively, there are dozens of chromatin marks that have not been characterized extensively that may mark silencers. Expanded efforts in profiling larger sets of tissue-specific chromatin marks may reveal a signature of active silencers. Similarly, it was surprising that co-repressor occupancy was not a good predictor of silencers. One potential explanation is that many of these elements may be silencers in other cell types or at other developmental stages than were assayed in this study, since co-repressor ChIP data were generated in whole embryos across a broad range of ages. Another possibility is that different subclasses of CRMs with silencer activity may be endowed with subclass-specific chromatin and/or TF signatures. The list of 29 silencers discovered by the sFS assay provides a training set that can be used for the further study of regulatory features that govern silencing (Gisselbrecht, 2019).

Enriched Snail binding was observed at a subset of mesodermal silencers. Snail is a well-characterized short-range repressor protein acting in the mesoderm, and it has been proposed to act by preventing regulated elements from looping to promoters. The current results are consistent with this general model; however, the effects of Snail repression spread over hundreds of base pairs and into neighboring regulatory elements in the reporter assay, in contrast to previously reported limits of short-range repression. Thus, the results indicate that Snail can act by different modes of repression, which had not been observed previously (Gisselbrecht, 2019).

This study provides evidence supporting a model of silencer activity in which a subset of silencers makes direct 3D contacts with the promoters of regulated genes. These physical interactions are important to consider when interpreting genome-wide maps of chromosome conformation. Not all promoter-interacting regions will act as enhancers, and it will be necessary to develop approaches that integrate a wide range of genomic data types to identify and functionally characterize cis-regulatory elements, including distinguishing those acting as enhancers versus silencers (Gisselbrecht, 2019).

It has recently been shown that many developmental enhancers also act as PREs. Despite some common features, including evidence for looping to target promoters, this set of bifunctional enhancer elements is nearly distinct from the elements this study has characterized that act as both enhancers and silencers, and appears to act by distinct mechanisms. It was previously reported that a Drosophila insulator element has a second role in mediating long-range enhancer-promoter interactions. It is suggested that a taxonomy of regulatory elements as enhancers, silencers, insulators, and so forth is likely an oversimplification, and that it is more useful to think generally of CRMs, which can activate and repress, recruit chromatin modifiers and remodelers, and/or structure the 3D genome in a context-sensitive fashion (Gisselbrecht, 2019).

It has been estimated that there may be >50,000 enhancers in the D. melanogaster genome. This study has detected mesodermal silencer activity in >10% of tested non-mesodermal enhancers. If these elements are representative of the broader enhancer population, then this result suggests that there may be thousands of such bifunctional elements across a range of tissues in Drosophila; since many of the elements were tested could be silencers in a cell type that was not examined or at a later developmental stage, these numbers are likely even higher. The sFS approach could be adapted in future studies to screen for bifunctional elements in mammals (Gisselbrecht, 2019).

These results suggest that most, if not all, silencers are also enhancers in a different cell type. CRM bifunctionality complicates the understanding of how gene regulation is specified in the genome and how it is read out in different cell types. The observation that the vast majority of complex trait- and disease-associated variants identified from genome-wide association studies (GWASs) map to noncoding sequences, most of which occur within DNase I hypersensitive sites, emphasizes the importance of understanding these elements. The characterization of bifunctional elements will help in elucidating how precise gene expression patterns are encoded in the genome and aid in the interpretation of cis-regulatory variation (Gisselbrecht, 2019).

Architecture of promoters of house-keeping genes in polytene chromosome interbands of Drosophila melanogaster

This is the first study to investigate the molecular-genetic organization of polytene chromosome interbands located on both molecular and cytological maps of Drosophila genome. The majority of the studied interbands contained one gene with a single transcription initiation site; the remaining interbands contained one gene with several alternative promoters, two or more unidirectional genes, and "head-to-head" arranged genes. In addition, intricately arranged interbands containing three or more genes in both unidirectional and bidirectional orientation were found. Insulator proteins, ORC, P-insertions, DNase I hypersensitive sites, and other open chromatin structures were situated in the promoter region of the genes located in the interbands. This area is critical for the formation of the interband, an open chromatin region in which gene transcription and replication are combined (Zykova, 2019).

Boundaries mediate long-distance interactions between enhancers and promoters in the Drosophila Bithorax complex

Drosophila bithorax complex (BX-C) is one of the best model systems for studying the role of boundaries (insulators) in gene regulation. Expression of three homeotic genes, Ubx, abd-A, and Abd-B, is orchestrated by nine parasegment-specific regulatory domains. These domains are flanked by boundary elements, which function to block crosstalk between adjacent domains, ensuring that they can act autonomously. Paradoxically, seven of the BX-C regulatory domains are separated from their gene target by at least one boundary, and must "jump over" the intervening boundaries. To understand the jumping mechanism, the Mcp boundary was replaced with Fab-7 and Fab-8. Mcp is located between the iab-4 and iab-5 domains, and defines the border between the set of regulatory domains controlling abd-A and Abd-B. When Mcp is replaced by Fab-7 or Fab-8, they direct the iab-4 domain (which regulates abd-A) to inappropriately activate Abd-B in abdominal segment A4. For the Fab-8 replacement, ectopic induction was only observed when it was inserted in the same orientation as the endogenous Fab-8 boundary. A similar orientation dependence for bypass activity was observed when Fab-7 was replaced by Fab-8. Thus, boundaries perform two opposite functions in the context of BX-C-they block crosstalk between neighboring regulatory domains, but at the same time actively facilitate long distance communication between the regulatory domains and their respective target genes (Postaka, 2018).

Boundaries flanking the Abd-B regulatory domains must block crosstalk between adjacent regulatory domains but at the same time allow more distal domains to jump over one or more intervening boundaries and activate Abd-B expression. While several models have been advanced to account for these two paradoxical activities, replacement experiments argued that both must be intrinsic properties of the Abd-B boundaries. Thus Fab-7 and Fab-8 have blocking and bypass activities in Fab-7 replacement experiments, while heterologous boundaries including multimerized dCTCF sites and Mcp from BX-C do not. One idea is that Fab-7 and Fab-8 are simply 'permissive' for bypass. They allow bypass to occur, while boundaries like multimerized dCTCF or Mcp are not permissive in the context of Fab-7. Another is that they actively facilitate bypass by directing the distal Abd-B regulatory domains to the Abd-B promoter. Potentially consistent with an 'active' mechanism that involves boundary pairing interactions, the bypass activity of Fab-8 and to a lesser extent Fab-7 is orientation dependent (Postaka, 2018).

In the studies reported it this study have tested these two models further. For this purpose the Mcp boundary was used for in situ replacement experiments. Mcp defines the border between the regulatory domains that control expression of abd-A and Abd-B. In this location, it is required to block crosstalk between the flanking domains iab-4 and iab-5, but it does not need to mediate bypass. In this respect, it differs from the boundaries that are located within the set of regulatory domains that control either abd-A or Abd-B, as these boundaries must have both activities. If bypass were simply passive, insertion of a 'permissive' Fab-7 or Fab-8 boundary in either orientation in place of Mcp would be no different from insertion of a generic 'non-permissive' boundary such as multimerized dCTCF sites. Assuming that Fab-7 and Fab-8 can block crosstalk out of context, they should fully substitute for Mcp. In contrast, if bypass in the normal context involves an active mechanism in which more distal regulatory domains are brought to the Abd-B promoter, then Fab-7 and Fab-8 replacements might also be able to bring iab-4 to the Abd-B promoter in a configuration that activates transcription. If they do so, then this process would be expected to show the same orientation dependence as is observed for bypass of the Abd-B regulatory domains in Fab-7 replacements (Postaka, 2018).

Consistent with the idea that a boundary located at the border between the domains that regulate abd-A and Abd-B need not have bypass activity, it was found that multimerized binding sites for the dCTCF protein fully substitute for Mcp. Like the multimerized dCTCF sites, Fab-7 and Fab-8 are also able to block crosstalk between iab-4 and iab-5. In the case of Fab-7, its' blocking activity is incomplete and there are small clones of cells in which the mini-y reporter is activated in A4. In contrast, the blocking activity of Fab-8 is comparable to the multimerized dCTCF sites and the mini-y reporter is off throughout A4. One plausible reason for this difference is that Mcp and the boundaries flanking Mcp (Fab-4 and Fab-6) utilize dCTCF as does Fab-8, while this architectural protein does not bind to Fab-7 (Postaka, 2018).

Importantly, in spite of their normal (or near normal) ability to block crosstalk, both boundaries still perturb Abd-B regulation. In the case of Fab-8, the misregulation of Abd-B is orientation dependent just like the bypass activity of this boundary when it is used to replace Fab-7. When inserted in the reverse orientation, Fab-8 behaves like multimerized dCTCF sites and it fully rescues the Mcp deletion. In contrast, when inserted in the forward orientation, Fab-8 induces the expression of Abd-B in A4 (PS9), and the misspecification of this parasegment. Unlike classical Mcp deletions or the Mcp^PRE replacement described in this study, expression of the Abd-B gene in PS9 is driven by iab-4, not iab-5. This conclusion is supported by two lines of evidence. First, the mini-y reporter inserted in iab-5 is off in PS9 cells indicating that iab-5 is silenced by PcG factors as it should be in this parasegment. Second, the ectopic expression of Abd-B is eliminated when the iab-4 regulatory domain is inactivated (Postaka, 2018).

These results, taken together with previous studies, support a model in which the chromatin loops formed by Fab-8 inserted at Mcp in the forward orientation brings the enhancers in the iab-4 regulatory domain in close proximity to the Abd-B promoter, leading to the activation of Abd-B in A4 (PS9). In contrast, when inserted in the opposite orientation, the topology of the chromatin loops formed by the ectopic Fab-8 boundary are not compatible with productive interactions between iab-4 and the Abd-B promoter. Moreover, it would appear that boundary bypass for the regulatory domains that control Abd-B expression is not a passive process in which the boundaries are simply permissive for interactions between the regulatory domains and the Abd-B promoter. Instead, it seems to be an active process in which the boundaries are responsible for bringing the regulatory domains into contact with the Abd-B gene. It also seems likely that bypass activity of Fab-8 (and also Fab-7) may have a predisposed preference, namely it is targeted for interactions with the Abd-B gene. This idea would fit with transgene bypass experiments, which showed that both Fab-7 and Fab-8 interacted with an insulator like element upstream of the Abd-B promoter, AB-I, while the Mcp boundary didn't (Postaka, 2018).

Similar conclusions can be drawn from the induction of Abd-B expression in A4 (PS9) when Fab-7 is inserted in place of Mcp. Like Fab-8, this boundary inappropriately targets the iab-4 regulatory domain to Abd-B. Unlike Fab-8, Abd-B is ectopically activated when Fab-7 is inserted in both the forward and reverse orientations. While the effects are milder in the reverse orientation, the lack of pronounced orientation dependence is consistent with experiments in which Fab-7 was inserted at its endogenous location in the reverse orientation. Unlike Fab-8 only very minor iab-6 bypass defects were observed. In addition to the activation of Abd-B in A4 (PS9) the Fab-7 Mcp replacements also alter the pattern of Abd-B regulation in more posterior segments. In the forward orientation, A4 and A5 are transformed towards an A6 identity, while A6 is also misspecified. Similar though somewhat less severe effects are observed in these segments when Fab-7 is inserted in the reverse orientation. At this point the mechanisms responsible for these novel phenotypic effects are uncertain. One possibility is that pairing interactions between the Fab-7 insert and the endogenous Fab-7 boundary disrupt the normal topological organization of the regulatory domains in a manner similar to that seen in boundary competition transgene assays. An alternative possibility is that Fab-7 targets iab-4 to the Abd-B promoter not only in A4 (PS9) but also in cells in A5 (PS10) and A6 (PS11). In this model, Abd-B would be regulated not only by the domain that normally specifies the identity of the parasegment (e.g., iab-5 in PS10), but also by interactions with iab-4. This dual regulation would increase the levels of Abd-B, giving the weak GOF phenotypes. Potentially consistent with this second model, inactivating iab-4 in the Mcp^F8 replacement not only rescues the A4 (PS9) GOF phenotypes but also suppresses the loss of anterior trichomes in the A6 tergite (Postaka, 2018).

Coactivator condensation at super-enhancers links phase separation and gene control

Super-enhancers (SEs) are clusters of enhancers that cooperatively assemble a high density of the transcriptional apparatus to drive robust expression of genes with prominent roles in cell identity. This study demonstrates that the SE-enriched transcriptional coactivators BRD4 and MED1 form nuclear puncta at SEs that exhibit properties of liquid-like condensates and are disrupted by chemicals that perturb condensates. The intrinsically disordered regions (IDRs) of BRD4 and MED1 can form phase-separated droplets, and MED1-IDR droplets can compartmentalize and concentrate the transcription apparatus from nuclear extracts. These results support the idea that coactivators form phase-separated condensates at SEs that compartmentalize and concentrate the transcription apparatus, suggest a role for coactivator IDRs in this process, and offer insights into mechanisms involved in the control of key cell-identity genes (Sabari, 2018).

Phase separation of fluids is a physicochemical process by which molecules separate into a dense phase and a dilute phase. Phase-separated biomolecular condensates, which include the nucleolus, nuclear speckles, stress granules, and others, provide a mechanism to compartmentalize and concentrate biochemical reactions within cells. Biomolecular condensates produced by liquid-liquid phase separation allow rapid movement of components into and within the dense phase and exhibit properties of liquid droplets such as fusion and fission. Dynamic and cooperative multivalent interactions among molecules, such as those produced by certain intrinsically disordered regions (IDRs) of proteins, have been implicated in liquid-liquid phase separation (Sabari, 2018).

Enhancers are gene regulatory elements bound by transcription factors (TFs) and other components of the transcription apparatus that function to regulate expression of cell type-specific genes. Super-enhancers (SEs) -- clusters of enhancers that are occupied by exceptionally high densities of transcriptional machinery -- regulate genes with especially important roles in cell identity. DNA interaction data show that enhancer elements in the clusters are in close spatial proximity with each other and the promoters of the genes that they regulate, consistent with the notion of a dense assembly of transcriptional machinery at these sites. This high-density assembly at SEs has been shown to exhibit sharp transitions of formation and dissolution, forming as the consequence of a single nucleation event and collapsing when concentrated factors are depleted from chromatin or when nucleation sites are deleted. These properties of SEs led to the proposal that the high-density assembly of biomolecules at active SEs is due to phase separation of enriched factors at these genetic elements. This study has provided experimental evidence that the transcriptional coactivators BRD4 and MED1 (a subunit of the Mediator complex) form condensates at SEs. This establishes a new framework to account for the diverse properties described for these regulatory elements and expands the known biochemical processes regulated by phase separation to include the control of cell-identity genes (Sabari, 2018).

SEs regulate genes with prominent roles in healthy and diseased cellular states. SEs and their components have been proposed to form phase-separated condensates, but with no direct evidence. This study demonstrates that two key components of SEs, BRD4 and MED1, form nuclear condensates at sites of SE-driven transcription. Within these condensates, BRD4 and MED1 exhibit apparent diffusion coefficients similar to those previously reported for other proteins in phase-separated condensates in vivo. The IDRs of both BRD4 and MED1 are sufficient to form phase-separated droplets in vitro, and the MED1-IDR facilitates phase separation in living cells. Droplets formed by MED1-IDR are capable of concentrating transcriptional machinery in a transcriptionally competent nuclear extract. These results support a model in which transcriptional coactivators form phase-separated condensates that compartmentalize and concentrate the transcription apparatus at SE-regulated genes and identify SE components that likely play a role in phase separation (Sabari, 2018).

SEs are established by the binding of master TFs to enhancer clusters. These TFs typically consist of a structured DNA-binding domain and an intrinsically disordered transcriptional activation domain. The activation domains of these TFs recruit high densities of many transcription proteins, which, as a class, are enriched for IDRs. Although the exact client-scaffold relationship between these components remains unknown, it is likely that these protein sequences mediate weak multivalent interactions, thereby facilitating condensation. It is proposed that condensation of such high-valency factors at SEs creates a reaction crucible within the separated dense phase, where high local concentrations of the transcriptional machinery ensure robust gene expression (Sabari, 2018).

The nuclear organization of chromosomes is likely influenced by condensates at SEs. DNA interaction technologies indicate that the individual enhancers within the SEs have exceptionally high interaction frequencies with one another, consistent with the idea that condensates draw these elements into close proximity in the dense phase. Several recent studies suggest that SEs can interact with one another and may also contribute in this fashion to chromosome organization. Cohesin, an SMC (structural maintenance of chromosomes) protein complex, has been implicated in constraining SE-SE interactions because its loss causes extensive fusion of SEs within the nucleus. These SE-SE interactions may be due to a tendency of liquid-phase condensates to undergo fusion (Sabari, 2018).

The model whereby phase separation of coactivators compartmentalizes and concentrates the transcription apparatus at SEs and their regulated genes raises many questions. How does condensation contribute to regulation of transcriptional output? A study of RNA Pol II clusters, which may be phase-separated condensates, suggests a positive correlation between condensate lifetime and transcriptional output. What components drive formation and dissolution of transcriptional condensates? These studies indicate that BRD4 and MED1 likely participate, but the roles of DNA-binding TFs, RNA Pol II, and regulatory RNAs require further study. Why do some proteins, such as HP1a, contribute to phase-separated heterochromatin condensates and others contribute to euchromatic condensates? The rules that govern partitioning into specific types of condensates have begun to be studied and will need to be defined for proteins involved in transcriptional condensates. Does condensate misregulation contribute to pathological processes in disease, and will new insights into condensate behaviors present new opportunities for therapy? Mutations within IDRs and misregulation of phase separation have already been implicated in a number of neurodegenerative diseases. Tumor cells have exceptionally large SEs at driver oncogenes that are not found in their cell of origin, and some of these are exceptionally sensitive to drugs that target SE components. How is it possible to take advantage of phase separation principles established in physics and chemistry to more effectively improve understanding of this form of regulatory biology? Addressing these questions at the crossroads of physics, chemistry, and biology will require collaboration across these diverse sciences (Sabari, 2018).

The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription

Gene expression is regulated by promoters, which initiate transcription, and enhancers, which control their temporal and spatial activity. However, the discovery that mammalian enhancers also initiate transcription questions the inherent differences between enhancers and promoters. This study investigate the transcriptional properties of predominantly mesodermal enhancers during Drosophila embryogenesis using characterized developmental enhancers. While the timing of enhancer transcription is generally correlated with enhancer activity, the levels and directionality of transcription are highly varied among active enhancers. To assess how this impacts function, a dual transgenic assay was developed to simultaneously measure enhancer and promoter activities from a single element in the same embryo. Extensive transgenic analysis revealed a relationship between the direction of endogenous transcription and the ability to function as an enhancer or promoter in vivo, although enhancer RNA (eRNA) production and activity are not always strictly coupled. Some enhancers (mainly bidirectional) can act as weak promoters, producing overlapping spatio-temporal expression. Conversely, bidirectional promoters often act as strong enhancers, while unidirectional promoters generally cannot. The balance between enhancer and promoter activity is generally reflected in the levels and directionality of eRNA transcription and is likely an inherent sequence property of the elements themselves (Mikhaylichenko, 2018).

Through the integration of information on transcription initiation in the noncoding genome [using deeply sequenced CAGE (Shiraki, 2003) and PRO-cap (Mahat, 2016)] with that of developmental enhancer activity (using hundreds of in vivo characterized embryonic enhancers), this study assessed the general properties of Drosophila eRNA. The results indicate that the general features of eRNA are highly conserved from flies to humans, including the level and orientation of eRNA transcription and the relative positioning of the INR motif. During the course of this study, 56 transgenic lines were generated to functionally assess regulatory elements with different eRNA properties for both enhancer and promoter activity. The results uncovered a number of intriguing features suggesting that there is a continuum of enhancer and promoter functions matching the continuum of endogenous transcription (Mikhaylichenko, 2018).

Comparing endogenous enhancer transcription with endogenous enhancer activity in transgenic embryos revealed a very strong global correlation between both the timing (developmental stage) and place (tissue) of enhancer activity. This is consistent with similar global comparisons in cell culture models and suggests a mechanistic link to TF occupancy or some other property of enhancer function. However, active enhancers were observed that have a wide range of eRNA levels, with many active enhancers having very low or undetectable eRNA at the stages when the enhancer is active. Similarly low levels of eRNA may also occur in other species; 35% of putative C. elegans enhancers do not overlap transcription initiation clusters (TICs), while 60% of intergenic putative mouse enhancers do not contain eRNA, as reported in one study. While these percentages may be overestimated due to the inclusion of elements that are not enhancers, nearly a third (20%-33%) of nontranscribed regulatory regions demonstrated enhancer activity in a luciferase assay. In the context of this study, all elements were confirmed embryonic enhancers, and this study carefully matched the stage of enhancer activity to the stage of eRNA detection. Active embryonic enhancers therefore are transcribed in a broad range, with the highly transcribing enhancers producing several orders of magnitude more transcripts than those with the weakest transcription, suggesting that eRNA production and enhancer activation can be uncoupled (at least for a subset of enhancers). For enhancers with very weak transcription, eRNAs are likely to be present only sporadically or in a minority of cells, suggesting that their continued presence is unlikely to be essential for these enhancers' function, although the act of transcription might be (Mikhaylichenko, 2018).

The presence of Pol II and the basal transcriptional machinery at enhancers and their ability to transcribe eRNAs question whether there is an inherent difference between an enhancer and a promoter, with some proposing a unified architecture between the two. To disentangle both activities, a new dual transgenic assay was developed that can measure enhancer and promoter activity at the same genomic location in the same embryos such that the timing as well as tissue specificity of both activities can be directly compared. Transgenic assays have the advantage of being able to measure regulatory activity at the endogenous levels of TFs and within a consistent chromatinized context-two properties that have a major impact on both enhancer and promoter activity. The readout (in situ hybridization) provides both spatial and temporal information at single-cell resolution, although it is difficult to derive quantitative information on activity-a clear disadvantage compared with in vitro reporter assays (Mikhaylichenko, 2018).

This study tested 27 regulatory elements (20 in both orientations) from different genomic locations and with different transcriptional properties for both enhancer and promoter activity. The results indicate that highly transcribed developmental enhancers can function as weak promoters in vivo. The spatial pattern of promoter activity was generally a subset of the tissues in which the enhancer was active, indicating that both activities can occur in the same cells from the same element. This promoter function depended largely on the orientation in which the element was inserted, matching the direction of enhancer transcription in its endogenous location: Bidirectional elements (both enhancers and promoters) can generally function as promoters in both orientations, while unidirectional elements have orientation-dependent activity. This indicates that promoter activity has intrinsic directionality and suggests the presence of directional sequence motifs within enhancer elements. In keeping with this, bidirectional mammalian promoter regions contain separate motifs that promote transcription in either direction (Core, 2014; Duttke, 2015); the current results point to a similar sequence-based determinant of enhancer directionality in Drosophila, supported by the presence of potential 'pairs' of INR motifs within bidirectional enhancers at the two points of maximal divergent transcription. Intragenic enhancers have been shown previously to act as alternative promoters, regulating unidirectional transcription in the direction of the host gene's expression to produce lncRNAs that are abundant, stable (polyadenylated), and spliced (Kowalczyk, 2012). In the case of the intergenic enhancers examined in this study, there is no evidence that they produce stable long transcripts. Standard strand-specific poly(A)+ RNA-seq did not detect any RNA at the vast majority of these enhancers, suggesting that Pol II elongation is fundamentally different at intergenic versus intragenic enhancers (Mikhaylichenko, 2018).

Recent high-throughput studies indicate that the same sequences can function as both promoters and enhancers in vitro, although gene promoters generated more promoter activity compared with distal elements (Nguyen, 2016; van Arensbergen, 2017). While the current results also show that the same sequences can harbor both activities, some key differences were uncovered. Tested elements that overlap a gene's main promoter, while acting as strong promoters both endogenously and in the promoter assay, do not possess enhancer activity (at least for four of the five elements examined). In contrast, some alternative gene promoters have an intriguing dual functionality, being able to act with seemingly equal strength as strong enhancers and promoters at the same stage in the same tissues. Using luciferase assays, Li (2012) found that strong and weak promoters have different enhancer activities with an inverse relationship between the two functions. In the context of embryonic development, the current results generally agree with this: Strong promoters (the main genes' promoters) generally have no detectable enhancer activity, while 'strong' (highly active) intergenic enhancers have weak (or not detectable) promoter activity (at least for the ones that were tested). This indicates that developmental enhancers and gene promoters generally have different intrinsic properties (Mikhaylichenko, 2018).

However, this study also found interesting intermediate cases between the two, which suggests a relationship between the directionality of eRNA transcription and the ability to function as an enhancer or promoter in vivo. When bidirectionally transcribed, alternative gene promoters can function as both strong promoters and enhancers in vivo in both orientations. In contrast, when unidirectionally transcribed, the element can generally function only as a promoter (and, in a few cases, as an enhancer) in an orientation-dependent manner matching its direction of transcription. One interesting example of the latter is an ~400-bp DNase 1 hypersensitivity site (DHS element) that overlaps the promoter of the twist gene and is transcribed in a unidirectional manner. This element can function as both a promoter and an enhancer but, interestingly, can perform both functions in only one orientation and is inactive in the other. These results suggest that some enhancers may have evolved to drive proximal orientation-dependent activation, possessing strong intrinsic promoter potential but lacking the ability to act more distally in an orientation-independent manner. To summarize, bidirectional cis-regulatory elements (either enhancers or promoters) can often function as both enhancers and promoters (although to different degrees) in an orientation-independent manner. In contrast, unidirectional elements generally function only in the orientation in which they are transcribed (Mikhaylichenko, 2018).

Taken together, these results suggest a continuum of functions that mirrors the continuum of eRNA directionality and levels of transcription at cis-regulatory elements. This spans from gene promoters that have high levels of unidirectional transcription and function mainly as orientation-dependent promoters (with little or no enhancer function) to elements with bidirectional (high level) transcription giving both promoter and enhancer orientation-independent activity (alternative promoters) to more distal elements with low levels of asymmetric or bidirectional transcription, which function mainly as enhancers, with a subset having weak orientation-independent promoter activity (Mikhaylichenko, 2018).

Bidirectionality has been suggested to be the ground state of transcription (Jin, 2017), and enhancer transcription may reflect this, serving as a source of evolutionary novelty. The finding that some TF-binding sites may possess the ability to initiate transcription suggests that selection for enhancer activity could allow promoter activity to arise as a by-product. If the presence of low-level promoter activity either as a consequence of selection for enhancer activity or simply due to the relative nonspecificity of the transcriptional machinery is common in enhancer elements, then eRNA could be exploited by evolution for other purposes in transcriptional regulation, including coactivator activity (e.g., activation of CBP [Bose, 2017] or TF trapping at enhancers [Sigova, 2015]). Alternatively, transcribed enhancers may have evolved from promoters, where a promoter was duplicated and became separated from its target gene over evolutionary time. Gradually, the sequence features leading to strong promoter activity would become more degenerate, while the element may gain more TF-binding sites. Although there is currently no evidence of this, it would fit with the promoter and enhancer activity that was observed in this study and with the fact that some species do not have distal enhancers but rather regulate gene expression by TF binding very close to the promoter. Although very speculative, alternative promoters may represent an intermediate state (from an evolutionary perspective) between promoters and enhancers. A previous study proposed that developmental enhancers evolve from inducible-type promoters (Arenas-Mena 2017). Of the elements that were tested in this study, main gene promoters appear to have evolved to drive proximal orientation-dependent activation, possessing strong intrinsic promoter potential. At the other extreme, distal enhancers possess weak promoter potential but seem to have specialized toward a distal orientation-independent mode of action-a function achieved, presumably, through acquiring binding sites for a set of factors distinct from promoters. Distal enhancers themselves represent a heterogeneous population of elements with variable transcriptional properties. The coexistence of the two functions opens many questions: How can the same regulatory element facilitate enhancer and promoter function? Can one function be perturbed independently of the other? A preliminary answer to the latter is suggested in this study: Enhancer function was unaffected by changing orientation, while some promoter activity was lost, suggesting separate directional sequence determinants for these promoters' activity (Mikhaylichenko, 2018).

Mutational analysis of a Drosophila neuroblast enhancer governing nubbin expression during CNS development

While developmental studies of Drosophila neural stem cell lineages have identified transcription factors (TFs) important to cell identity decisions, currently only an incomplete understanding exists of the cis-regulatory elements that control the dynamic expression of these TFs. Previous studies have identified multiple enhancers that regulate the POU-domain TF paralogs nubbin and pdm-2 genes. Evolutionary comparative analysis of these enhancers reveals that they each contain multiple conserved sequence blocks (CSBs) that span TF DNA-binding sites for known regulators of neuroblast (NB) gene expression in addition to novel sequences. This study functionally analyzes the conserved DNA sequence elements within a NB enhancer located within the nubbin gene and highlights a high level of complexity underlying enhancer structure. Mutational analysis has revealed CSBs that are important for enhancer activation and silencing in the developing CNS. Adjusting the number and relative positions of the TF binding sites within these CSBs alters enhancer function (Ross, 2018).

A previous enhancer-reporter transgene survey identified an enhancer (denoted as nub-46) that recapitulated nub expression during embryonic cephalic lobe and VNC NB lineage development (Ross, 2015). As an initial step to functionally characterize the nub-46 enhancer, its conserved sequence blocks were identified by comparative evolutionary analysis using 12 Drosophila species, including D. melanogaster, D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. persimilis, D. pseudoobscura, D. willistoni, D. virilis, D. mojavensis, and D. grimshawi. This analysis revealed that nub-46 is made up of 11 CSBs. While many of its conserved elements are novel, a CSB, denoted as 'C', containing two adjacent 9-mer sequences (TAAAAATTG and CATAAAAAA) corresponds to the DNA-binding site motifs for Cas (Ross, 2018).

The nub-46 enhancer-reporter transgene expression is dynamic during embryonic CNS development. Transient nub-46 activation is observed at the cellular blastoderm stage, followed by progressive NB reactivation during embryonic neurogenesis. At stage 9, nub-46 regulates transgene reporter expression in several NBs per ventral cord hemisegment, and enhancer activity is detected in a subset of cephalic lobe NBs. Later in CNS development enhancer/reporter expression is detected in additional cephalic lobe and ventral cord NB lineages. After embryonic stage 13, nub-46 cis-regulatory activity is downregulated in both the brain and ventral cord (Ross, 2018).

To delimit the boundaries of the nub-46 enhancer, both 5' and 3' deletions were generated of the full nub-46 enhancer CSB cluster and the in vivo cis-regulatory activity of these truncated fragments was examined via enhancer-reporter transgenes. This analysis revealed that the centrally located CSBs were sufficient for embryonic CNS expression. However, compared to the full-length enhancer, a reduced enhancer/reporter activity was observed for the core that contains elements 'C' through 'I'. These findings demonstrate that the core fragment consists of activator and repressor sequences required for its wild-type spatial and temporal regulatory dynamics (Ross, 2018).

Given that Cas is a negative regulator of pdm gene expression in embryonic NBs, it was predicted that the putative Cas DNA-binding motifs within nub-46 are required to deactivate enhancer activity. Expression of nub-46 enhancer activity partially overlaps endogenous Cas protein expression in stage 13 embryos. To determine whether the putative Cas binding-motifs function as Cas binding sites, the regulatory activity was examined of a nub-46 deletion that lacks a 40 bp conserved region containing the two Cas motifs. Deletion of the Cas DNA-binding sites triggers ectopic enhancer activity in the cephalic lobes during stage 13, suggesting that the 'C' CSB functions as a repressor element during cephalic lobe development. Interestingly, no significant ectopic enhancer activity was observed in the developing VNC. Therefore, removal of the nub-46 'C' CSB does not completely account for the repressive action of Cas on the nub-46 enhancer, especially in the VNC, and other direct or indirect effects of Cas action on the nub should be considered (Ross, 2018).

While the 'C' element may contain repressor DNA-binding sites, it remained unknown how the nub-46 enhancer is activated in the embryonic CNS. To address this question, the effects of internal deletions within the nub-46 enhancer were further examined. Each of the 10 remaining CSBs were individually removed. Enhancers with these individual deletions were tested in two independent transgenic lines. The wild type control enhancer activity was tested under the same conditions and at the same time as the deletion mutants. It was observed that nub-46 variants lacking either the 'B' (AGAACGCAAT) element or 'E' (CTACCTGAG) element displayed only a modest reduction in enhancer activity compared to the wild-type. Surprisingly, it wasfound that singular removal of other CSBs had only subtle effects on enhancer activity during embryonic NB lineage development, suggesting that these CSBs may be either required at later time points or are functionally redundant (Ross, 2018).

Given that other cis-regulatory enhancers contain a combination of repeat and unique sequence elements, it was hypothesized that nub-46 activation may result from a complex set of multiple inputs. Indeed, self-alignment of conserved sequences within nub-46 revealed that the enhancer is made up of 11 distinct repeat and palindromic elements. Upon closer inspection, seven of the 10 repeat elements were found to be located within the 'C' element, and that many of these repeats were also found in CSBs 'D,' 'H,' and 'I' of the enhancer core (Ross, 2018).

Next, whether the repeat elements within the core are required for enhancer activation was assessed. Loss of the 'C' element does not significantly affect onset of enhancer-reporter expression during embryonic VNC development. Among the six repeats identified within the 'C' element, nearly all are present in the 'D,' 'H,' and 'I' elements, and it was speculated that these may compensate for the loss of repeats in the nub-46 [C]^- mutant. To test this hypothesis, the core was truncated to exclude the 'C' element (denoted as the [C^-] and then further removed all elements containing repeats ('D,' 'H,' and 'I' elements) from the core enhancer (referred to as [CDHI]^-. Surprisingly, removal of these CSBs had little or no effect on enhancer activity. One possible explanation for the lack of any significant effect of element 'C' (and other elements containing repeat sequences) on enhancer activation is that activator sequences are located within elements lacking repeats (elements 'E,' 'F,' and 'G'). To investigate whether the 'E' (CTACCTGAG), 'F' (GGGGTGTCAAATACCAGC), and 'G' (TACCGTA) elements are required for enhancer activation, all three elements were removed from the enhancer [CEFG]^- and it was observed that deletion of these resulted in complete loss of reporter activity, suggesting that 'E,' 'F,' and 'G', containing only unique sequences, are required to activate reporter expression. To determine whether a subset of these elements is necessary for enhancer function, the effect was tested of different combinations of internal deletions on cis-regulatory activity during embryonic neurogenesis. While removal of either the 'E,' 'F,' or 'G' elements had little or no effect on enhancer function, only the combined loss of 'E' and 'F' compromised core activity. Notably, however, increased enhancer activity was identified with loss of 'F' and 'G', whereas loss of all three non-repeat elements disrupted enhancer function. Individual deletion of non-repeat CSBs exhibited minor reduction in enhancer activity within brain lineages (Ross, 2018).

Given that all three elements lacking repeat sequences are essential for enhancer function, it was next asked whether enhancer function is modified by the multiplicity of these sequences. To explore this, core enhancers were synthesized that contain three copies of either element, substituting each into the positions of the other two non-repeat elements. Construct expression was also examined during multiple stages of CNS development. Whenthe 'F' and 'G' elements were replaced with 'E' elements, increasing the number of 'E' elements to three, higher enhancer activity was observed within subsets of NBs compared to the wild-type during stage 11. However, by stage 13, higher levels of enhancer activity were also observed throughout the CNS. Notably, ectopic expression was also observed within putative PNS lineages during stage 14. It should be noted that additional co-localization experiments using cell lineage markers would be needed to substantiate the ectopic expression. Increasing the number of 'F' elements also altered core enhancer activity, but the effect was limited to a subset of lateral VNC NBs and dorso-anterior cephalic lobe cells during early stage 12. These differences were not apparent at stage 13 and stage 14. Increasing the number of 'G' elements resulted in diminished expression at all three stages examined (Ross, 2018).

The principal findings of this study are the identification of a core sequence within the nub-46 NB enhancer that is sufficient to recapitulate the embryonic expression pattern of nubbin and that novel non-repeated conserved sequences are required for enhancer activity. This study has delimited the target of Cas repression to a CSB containing two adjacent 9-mer sequences corresponding to the TF DNA-binding motif for Cas in CSB 'C.' Nevertheless, the possibility still exists that Cas is not the only repressor of nub-46 during embryonic CNS development (Ross, 2018).

Also activator CSBs were identified that contain uniquely represented sequences within the enhancer, suggesting that the enhancer may be regulated by as yet uncharacterized TF activators that play a role in the temporal regulation of nubbin. These data suggests that multiple copies of either 'E' or 'F' can function as an activator within the enhancer core. While previous studies have suggested that clusters of repeat regulatory sequences are an important aspect of enhancer regulation, this study points to unique non-repeated motifs as targets of transcriptional activators. While these initial observations revealed altered expression outside the spatial/temporal boundaries of nub-46 activity, further experiments using cell-type specific markers are needed to confirm this ectopic expression (Ross, 2018).

Transcription factors activate genes through the phase-separation capacity of their activation domains

Gene expression is controlled by transcription factors (TFs) that consist of DNA-binding domains (DBDs) and activation domains (ADs). The DBDs have been well characterized, but little is known about the mechanisms by which ADs effect gene activation. This study, carried out in murine embryonic stem cells, reports that diverse ADs form phase-separated condensates with the Mediator coactivator. For the OCT4 and GCN4 TFs, this study shows that the ability to form phase-separated droplets with Mediator in vitro and the ability to activate genes in vivo are dependent on the same amino acid residues. For the estrogen receptor (ER), a ligand-dependent activator, it was shown that estrogen enhances phase separation with Mediator, again linking phase separation with gene activation. These results suggest that diverse TFs can interact with Mediator through the phase-separating capacity of their ADs and that formation of condensates with Mediator is involved in gene activation (Boija, 2018).

Regulation of gene expression requires that the transcription apparatus be efficiently assembled at specific genomic sites. DNA-binding transcription factors (TFs) ensure this specificity by occupying specific DNA sequences at enhancers and promoter-proximal elements. TFs typically consist of one or more DNA-binding domains (DBDs) and one or more separate activation domains (ADs). While the structure and function of TF DBDs are well documented, comparatively little is understood about the structure of ADs and how these interact with coactivators to drive gene expression (Boija, 2018).

The structure of TF DBDs and their interaction with cognate DNA sequences has been described at atomic resolution for many TFs, and TFs are generally classified according to the structural features of their DBDs. For example, DBDs can be composed of zinc-coordinating, basic helix-loop-helix, basic-leucine zipper, or helix-turn-helix DNA-binding structures. These DBDs selectively bind specific DNA sequences that range from 4 to 12 bp, and the DNA binding sequences favored by hundreds of TFs have been described. Multiple TF molecules typically bind together at any one enhancer or promoter-proximal element. For example, at least eight different TF molecules bind a 50-bp core component of the interferon (IFN)-β enhancer (Boija, 2018).

Anchored in place by the DBD, the AD interacts with coactivators, which integrate signals from multiple TFs to regulate transcriptional output. In contrast to the structured DBD, the ADs of most TFs are low-complexity amino acid sequences not amenable to crystallography. These intrinsically disordered regions (IDRs) have therefore been classified by their amino acid profile as acidic, proline, serine/threonine, or glutamine rich or by their hypothetical shape as acid blobs, negative noodles, or peptide lassos. Remarkably, hundreds of TFs are thought to interact with the same small set of coactivator complexes, which include Mediator and p300. ADs that share little sequence homology are functionally interchangeable among TFs; this interchangeability is not readily explained by traditional lock-and-key models of protein-protein interaction. Thus, how the diverse ADs of hundreds of different TFs interact with a similar small set of coactivators remains a conundrum. Recent studies have shown that the AD of the yeast TF GCN4 binds to the Mediator subunit MED15 at multiple sites and in multiple orientations and conformations. The products of this type of protein-protein interaction, where the interaction interface cannot be described by a single conformation, have been termed 'fuzzy complexes'. These dynamic interactions are also typical of the IDR-IDR interactions that facilitate formation of phase-separated biomolecular condensates (Boija, 2018).

It has recently been proposed that transcriptional control may be driven by the formation of phase-separated condensates and it was demonstrated that the coactivator proteins MED1 and BRD4 form phase-separated condensates at super-enhancers (SEs). This study report that diverse TF ADs phase separate with the Mediator coactivator. The embryonic stem cell (ESC) pluripotency TF OCT4, the estrogen receptor (ER), and the yeast TF GCN4 form phase-separated condensates with Mediator and require the same amino acids or ligands for both activation and phase separation. It is proposed that IDR-mediated phase separation with coactivators is a mechanism by which TF ADs activate genes (Boija, 2018).

The results described in this study support a model whereby TFs interact with Mediator and activate genes by the capacity of their ADs to form phase-separated condensates with this coactivator. For both the mammalian ESC pluripotency TF OCT4 and the yeast TF GCN4, it was found that the AD amino acids required for phase separation with Mediator condensates were also required for gene activation in vivo. For ER, it was found that estrogen stimulates the formation of phase-separated ER-MED1 droplets. ADs and coactivators generally consist of low-complexity amino acid sequences that have been classified as IDRs, and IDR-IDR interactions have been implicated in facilitating the formation of phase-separated condensates. It is proposed that IDR-mediated phase separation with Mediator is a general mechanism by which TF ADs effect gene expression and provide evidence that this occurs in vivo at SEs. It is suggested that the ability to phase separate with Mediator, which would employ the features of high valency and low-affinity characteristic of liquid-liquid phase-separated condensates, operates alongside an ability of some TFs to form high-affinity interactions with Mediator (Boija, 2018).

The model that TF ADs function by forming phase-separated condensates with coactivators explains several observations that are difficult to reconcile with classical lock-and-key models of protein-protein interaction. The mammalian genome encodes many hundreds of TFs with diverse ADs that must interact with a small number of coactivators, and ADs that share little sequence homology are functionally interchangeable among TFs. The common feature of ADs-the possession of low-complexity IDRs-is also a feature that is pronounced in coactivators. The model of coactivator interaction and gene activation by phase-separated condensate formation thus more readily explains how many hundreds of mammalian TFs interact with these coactivators (Boija, 2018).

Previous studies have provided important insights that prompted an investigation of the possibility that TF ADs function by forming phase-separated condensates. TF ADs have been classified by their amino acid profile as acidic, proline rich, serine/threonine rich, glutamine rich, or by their hypothetical shape as acid blobs, negative noodles, or peptide lassos. Many of these features have been described for IDRs that are capable of forming phase-separated condensates. Evidence that the GCN4 AD interacts with MED15 in multiple orientations and conformations to form a 'fuzzy complex' is consistent with the notion of dynamic low-affinity interactions characteristic of phase-separated condensates. Likewise, the low complexity domains of the FET (FUS/EWS/TAF15) RNA-binding proteins can form phase-separated hydrogels and interact with the RNA polymerase II C-terminal domain (CTD) in a CTD phosphorylation-dependent manner; this may explain the mechanism by which RNA polymerase II is recruited to active genes in its unphosphorylated state and released for elongation following phosphorylation of the CTD (Boija, 2018).

The model described in this study for TF AD function may explain the function of a class of heretofore poorly understood fusion oncoproteins. Many malignancies bear fusion-protein translocations involving portions of TFs. These abnormal gene products often fuse a DNA or chromatin-binding domain to a wide array of partners, many of which are IDRs. For example, MLL may be fused to 80 different partner genes in AML, the EWS-FLI rearrangement in Ewing's sarcoma causes malignant transformation by recruitment of a disordered domain to oncogenes, and the disordered phase-separating protein FUS is found fused to a DBD in certain sarcomas. Phase separation provides a mechanism by which such gene products result in aberrant gene expression programs; by recruiting a disordered protein to the chromatin, diverse coactivators may form phase-separated condensates to drive oncogene expression. Understanding the interactions that compose these aberrant transcriptional condensates, their structures, and behaviors may open new therapeutic avenues (Boija, 2018).

Boundaries mediate long-distance interactions between enhancers and promoters in the Drosophila Bithorax complex

Different enhancer classes in Drosophila bind distinct architectural proteins and mediate unique chromatin interactions and 3D architecture

Genome-wide studies has identified two enhancer classes in Drosophila that interact with different core promoters: housekeeping enhancers (hkCP) and developmental enhancers (dCP). It is hypothesized that the two enhancer classes are occupied by distinct architectural proteins, affecting their enhancer-promoter contacts. It was determined that both enhancer classes are enriched for RNA Polymerase II, CBP, and architectural proteins but there are also distinctions. hkCP enhancers contain H3K4me3 and exclusively bind Cap-H2, Chromator, DREF and Z4, whereas dCP enhancers contain H3K4me1 and are more enriched for Rad21 and Fs(1)h-L. Additionally, the interactions of each enhancer class were mapped utilizing a Hi-C dataset with <1 kb resolution. Results suggest that hkCP enhancers are more likely to form multi-TSS interaction networks and be associated with topologically associating domain (TAD) borders, while dCP enhancers are more often bound to one or two TSSs and are enriched at chromatin loop anchors. The data support a model suggesting that the unique architectural protein occupancy within enhancers is one contributor to enhancer-promoter interaction specificity (Cubenas-Potts, 2017).

This study characterize the protein occupancy, chromatin interactions and architecture profiles for the two enhancer classes found in Drosophila. Each enhancer class has distinct H3K4 methylation states, is bound by both common and distinct architectural proteins, and is involved in distinct types of chromatin interactions. First, it was established that hkCP enhancers exclusively bind CAP-H2, Chromator, DREF and Z4, while dCP enhancers do not and are preferentially enriched for but not exclusively bound by Fs(1)h-L and Rad21. In addition, hkCP enhancers are more likely than dCP enhancers to associate with multiple TSSs, which promotes a higher transcriptional output. Finally, hkCP enhancers preferentially associate with topologically associating domain (TAD) borders, whereas dCP enhancers are enriched at chromatin loop anchors present inside TADs. Interestingly, enhancers activated by both core promoters exhibit more hkCP enhancer like characteristics, indicating that the both CP enhancers may represent an intermediate among the distinctive hkCP and dCP enhancers. Altogether, these results provide strong correlative evidence, supporting a model suggesting that architectural proteins are critical regulators of enhancer-promoter interaction specificity and that the interactions between enhancers and promoters significantly contribute to the generation of 3D chromatin architecture (Cubenas-Potts, 2017).

The importance of architectural proteins in regulating enhancer-promoter interactions in Drosophila is supported by the observation that the vast majority of architectural protein sites present in the genome correspond to enhancers and promoters. Historically, architectural proteins were identified as insulators, which were functionally demonstrated to block enhancer-promoter interactions. The insulator function of architectural proteins correlates with their enrichment at TAD borders. However, several lines of evidence, including ChIA-PET analysis of CTCF- and cohesin-mediated interactions in mammals, suggest that these architectural proteins help mediate long range contacts among regulatory sequences. In Drosophila this study observed that nearly all of the Group 1 and Group 2 architectural protein sites are associated with enhancers or promoters defined by STARR-seq, TSSs or CBP peaks, suggesting that architectural proteins help mediate enhancer-promoter interactions. Notably, Group 3 architectural proteins include the classic insulator proteins CTCF, CP190, Mod(mdg4) and SuHw, and at least 25% of their peaks cannot be explained by enhancers or promoters. It is interesting to speculate that the non-enhancer-promoter sites may be involved in more classical insulator functions or contributing to the chromatin architecture of inactive regions of the genome (Cubenas-Potts, 2017).

The conclusion that architectural proteins are critical regulators of the specificity between enhancers and promoters is supported by two main lines of evidence. First, the current results demonstrate a strong correlation between each enhancer class and distinct architectural protein subcomplexes. Functional evidence supporting this conclusion comes from mutational analyses of the DRE motif in the distinct enhancer classes, which likely recruits DREF and the other hkCP enhancer associated architectural proteins. Zabidi (2015) demonstrated that the tandem DRE motif alone was sufficient to enhance expression of the housekeeping core promoter and that mutation of DRE motifs within an hkCP enhancer reduced its promoter interactions in a luciferase assay. Furthermore, addition of a DRE motif to a dCP enhancer changed its promoter specificity. Because DREF and potentially BEAF-32 bind to the DRE motif, these results strongly support a model suggesting that the differential occupancy of Cap-H2, Chromator, DREF and Z4 in the two enhancer classes is a critical regulator of their specific interactions with the core promoter types. However, the data cannot discount the notion that unique transcription factor binding at proximal TSSs also contribute to the specificity of enhancer-promoter interactions. Although hkCP enhancer identity is most highly correlated with CAP-H2, Chromator, DREF and Z4 localization, these four architectural proteins are not found in isolation within hkCP enhancers. BEAF-32 and CP190 are also strongly enriched in hkCP enhancers, which are also associated with high occupancy APBSs and TAD borders. Thus, the full architectural protein complement at hkCP enhancers is far more complex than the four hkCP-specific architectural proteins. In addition, architectural proteins that are truly unique to dCP enhancers were not detected. Because dCP enhancers exhibit higher cell type specificity, it cannot be discounted that there are additional dCP enhancers present in the Drosophila genome that were not identified by STARR-seq and thus, excluded from this analysis. From these studies, it is unclear if the enrichment of Fs(1)h-L and Rad21, particularly because Fs(1)h-L and Rad21 are present in hkCP enhancers at lower levels, or the absence of BEAF-32, CAP-H2, Chromator, CP190, DREF and Z4 truly distinguishes the architectural protein complexes found at dCP enhancers. In the future, careful biochemical analyses will be required to gain a comprehensive understanding of the complete organization of architectural protein subcomplexes associated with each enhancer class (Cubenas-Potts, 2017).

hkCP enhancers are associated with multi-TSS chromatin interactions and TAD borders. The promoter-clustering by hkCP enhancers results in a dose-dependent increase in transcriptional output for the interacting genes. Thus, one likely molecular mechanism by which hkCP enhancers promote robust transcriptional activation is by increasing the local concentration of RNA Polymerase II and general transcription factors (GTFs) by bringing multiple TSSs into close proximity. It is interesting that the hkCP enhancers, which form promoter clusters, are associated with TAD borders. It is speculated that the hkCP enhancer interactions involve inter-TAD contacts within the A-type compartment, indicative of the formation of transcription factories (70). From this analysis, it is unclear if the hkCP enhancers alone are sufficient for the formation of the 3D interactions or the neighboring TSSs and their associated transcription factors are also contributing to these contacts. It is hypothesized that the genes recruited to the factories contain the housekeeping promoter motifs (DRE, Ohler 1, Ohler 6 and TCT) and that the hkCP enhancer residents Cap-H2, Chromator, DREF and Z4, are critical to the formation of these 3D contacts (Cubenas-Potts, 2017).

dCP enhancers are more likely to be present within TADs and are enriched on the subTAD-like chromatin loop anchors. dCP enhancers do not form promoter clusters, but are more likely to interact with individual TSSs. One possible explanation for this observation is that the genes interacting with dCP enhancers require the binding of sequence-specific transcription factors, and increasing the concentration of GTFs and RNA polymerase II is not an effective mechanism to promote transcriptional output. The chromatin loop association is consistent with dCP enhancers forming a strong contact with a single TSS. However, it is acknowledged that dCP enhancers are likely one of multiple molecular mechanisms contributing to chromatin loop formation. Surprisingly, the chromatin loops that were observed in Drosophila are distinct from the chromatin loops described in humans. A recent study reported approximately 10,000 chromatin loops in the genome of GM12878 lymphoblastoid cells, but this study detected only 458 chromatin loops in Drosophila utilizing a similar method. The reason why there are so few chromatin loops in Drosophila compared to humans is unclear. It is possible that chromatin loops represent a more precise level of architecture within TADs between specific enhancers and promoters in mammals, but because TADs are significantly smaller in flies (median size 32.5 kb compared to 880 kb in mice, the chromatin loops are not as prominent or easily detected in the Drosophila genome. Notably, it appears that the chromatin loops are generated by different architectural proteins in the two species. The chromatin loops in humans are anchored by convergent CTCF motifs, while the results presented in this study demonstrate that the chromatin loop anchors in Drosophila are depleted of CTCF. Because the chromatin loops in Drosophila show a strong enrichment for Fs(1)h-L, a Brd4 homolog, and the architectural proteins Rad21, Nup98, TFIIIC and Mod(mdg4), it is possible that a combination of transcription and architectural proteins is required for chromatin loop formation in flies, which may be different from mammals . Altogether, it is clear that dCP enhancers are involved in individual contacts with TSSs and are likely one mechanism by which chromatin loops form in Drosophila (Cubenas-Potts, 2017).

Surprisingly, only ~20% and ~12.5% of all hkCP enhancer and ~7.5% and ~8.5% of dCP enhancer interactions involve a TSS or enhancer on the opposite anchor, respectively. The biological significance of the enhancer to non-TSS association is unclear. One possible explanation is that current methods for identifying statistically significant interactions are not sufficiently robust and that many of the enhancer to non-TSS interactions are not representative of biologically significant contacts. However, it cannot be discounted that the non-TSS interactions mediated by enhancers are real and the biological significance of these contacts remains to be determined. Throughout this analysis, the patterns of TSS interactions were compared with each enhancer class instead of drawing conclusions about the absolute number of TSSs bound per enhancer, minimizing the impact of any non-specific interactions within the data. Additional molecular studies for the various type of enhancer interactions (enhancer to promoter, enhancer to non-TSS, etc.) will be required to evaluate the various biological contributions of each (Cubenas-Potts, 2017).

This study found that the functional differences between enhancers that activate housekeeping versus developmental genes are reflected in their chromatin and architectural protein composition, and in the type of interactions they mediate. hkCP enhancers are marked by H3K4me3, associate with TAD borders, and mediate large TSS-clustered interactions to promote robust transcription. This class of enhancers contain the architectural proteins CAP-H2, Chromator, DREF and Z4. In contrast, dCP enhancers are marked by H3K4me1, associate with chromatin loop anchors and are more commonly associated with single TSS-contacts. dCP enhancers are depleted of the hkCP-specific architectural proteins and show an enrichment for Fs(1)h-L and Rad21. The results support a model suggesting that the unique occupancy of architectural proteins in the distinct enhancer classes are key contributors to the types of interactions that enhancers can mediate genome-wide, ultimately affecting enhancer-promoter specificity and 3D chromatin organization. In the future, further characterization of the broadly defined housekeeping and developmental enhancers into smaller subclasses may yield additional levels of regulation and formation of unique architectural protein and transcription factor protein complexes as key mediators of long range chromatin contacts (Cubenas-Potts, 2017).

Dual functionality of cis-regulatory elements as developmental enhancers and Polycomb response elements

Developmental gene expression is tightly regulated through enhancer elements, which initiate dynamic spatio-temporal expression, and Polycomb response elements (PREs), which maintain stable gene silencing. These two cis-regulatory functions are thought to operate through distinct dedicated elements. By examining the occupancy of the Drosophila pleiohomeotic repressive complex (PhoRC) during embryogenesis, extensive co-occupancy was revealed at developmental enhancers. Using an established in vivo assay for PRE activity, it was demonstrated that a subset of characterized developmental enhancers can function as PREs, silencing transcription in a Polycomb-dependent manner. Conversely, some classic Drosophila PREs can function as developmental enhancers in vivo, activating spatio-temporal expression. This study therefore uncovers elements with dual function: activating transcription in some cells (enhancers) while stably maintaining transcriptional silencing in others (PREs). Given that enhancers initiate spatio-temporal gene expression, reuse of the same elements by the Polycomb group (PcG) system may help fine-tune gene expression and ensure the timely maintenance of cell identities (Erceg, 2017).

While enhancers initiate spatio-temporal transcriptional activity, PREs maintain a previously determined transcriptional state of their target genes, thus leading to transcriptional memory. PREs are generally thought to be dedicated solely to gene silencing and not to contain enhancer-like features to activate gene expression. This study presents evidence to the contrary, that both functions can be encoded in the same cis-regulatory element, depending on the cellular context. This is not a rare event -- almost 25% of PhoRC occupancy is at developmental enhancers. Of the 16 elements that this study tested experimentally (either enhancers for PRE activity or PREs for enhancer activity), nine have dual function, being sufficient to activate transcription in a specific spatio-temporal pattern and mediate PcG-dependent silencing in vivo (Erceg, 2017).

These dual elements have interesting implications for transcriptional regulation during embryonic development. First, at the level of PcG protein recruitment, this subset of enhancers is highly enriched in the Pho motif, which distinguishes them from other developmental enhancers. This suggests that the recruitment of Pho to PhoRC enhancers is direct via sequence-specific DNA binding, consistent with an instructive model of recruitment, although other factors are likely involved. PcG proteins and developmental TFs bind in close proximity to each other within the same element (a single DNase hypersensitive site), raising the possibility of direct interplay between the two. The results indicate that the activity of PhoRC-bound enhancers is dominated by tissue-specific TFs that activate transcription in some cells while being dominated by a functional PcG complex in other cells. Is this due to mutually exclusive occupancy of developmental TFs and PcG proteins in different tissues, or do they compete functionally at these elements? The dramatic derepression of enhancer activity in different cell types upon PcG protein removal suggests that other tissue-specific TFs must occupy these enhancers in the PcG silenced cell. This has interesting implications for enhancer activity, as it is well known that TFs bind to thousands of sites (tens of thousands in mammalian cells), but only a subset of associated target genes changes expression when the TF is removed. This has led to the general assumption that the majority of binding events is nonfunctional or neutral. These data suggest that at least a subset of this embryonic occupancy can be functional if not actively antagonized by the presence of PcGs (Erceg, 2017).

Second, enhancer-mediated polycomb recruitment has interesting implications for the mechanism of PcG-mediated silencing. The current models suggest that PcG proteins silence transcription mainly by silencing a gene's promoter, in keeping with PcG recruitment to CpG islands in vertebrates, or by coordinating a three-dimensional repressive topology, where the entire gene's locus is silenced. In either mode, a gene's promoter would not be permissive to enhancer activation. The data suggest that there may be a third mode of very local silencing at an individual enhancer, leaving the promoter and the rest of the gene's regulatory landscape open for activation by other enhancers, as was observed at the prat2 locus. This would allow for much more fine-tuning of silencing in individual tissues and stages. It also suggests that PcG proteins could play a more dynamic role, similar to a 'standard' transcriptional repressor at enhancers (Erceg, 2017).

Third, this may have broader implications for cell fate decisions during rapid developmental transitions. When multipotent cells become specified into different lineages, a specific transcriptional program often needs to be activated in one cell while being repressed in other cells from the same progenitor population. Having active enhancers in the precursor cells remain accessible to directly recruit the PcG complexes would ensure that these enhancers become silenced in a timely manner. Conversely, having maternally deposited PcG proteins already bound to enhancers early in development may serve as placeholders to ensure that these dual elements remain open and available for TFs to activate at the appropriate development stage. Interestingly, in the majority of the tested cases, PcG proteins and developmental TFs use these dual elements to regulate the same target gene, the vast majority of which is key developmental regulators of cell identity (Erceg, 2017).

The identification of PREs in other species has remained a key challenge, with only a handful of PREs identified in mammals and plants to date. In mammals, the PcG system is recruited to inactive CpG islands, with few specific sequence features. Although there are mammalian homologs of the Drosophila Pho and dSfmbt proteins, Yin Yang 1 (YY1) and SFMBT, respectively, the conservation of PhoRC as a complex and its involvement in mammalian PcG silencing remain unclear. It is proposed that such dual enhancers/PREs will also exist in mammals, although, given this apparent lack of conservation of YY1 function, their mechanism of PcG recruitment may have diverged (Erceg, 2017).

cis-regulatory analysis of the Drosophila pdm locus reveals a diversity of neural enhancers

One of the major challenges in developmental biology is to understand the regulatory events that generate neuronal diversity. During Drosophila embryonic neural lineage development, cellular temporal identity is established in part by a transcription factor (TF) regulatory network that mediates a cascade of cellular identity decisions. Two of the regulators essential to this network are the POU-domain TFs Nubbin and Pdm-2, encoded by adjacent genes collectively known as pdm. The focus of this study is the discovery and characterization of cis-regulatory DNA that governs their expression. Phylogenetic footprinting analysis of a 125 kb genomic region that spans the pdm locus identified 116 conserved sequence clusters. To determine which of these regions function as cis-regulatory enhancers that regulate the dynamics of pdm gene expression, this study tested each for in vivo enhancer activity during embryonic development and postembryonic neurogenesis. The screen revealed 77 unique enhancers positioned throughout the noncoding region of the pdm locus. Many of these activated neural-specific gene expression during different developmental stages and many drove expression in overlapping patterns. Sequence comparisons of functionally related enhancers that activate overlapping expression patterns revealed that they share conserved elements that can be predictive of enhancer behavior. To facilitate data accessibility, the results of this analysis are catalogued in cisPatterns, an online database of the structure and function of these and other Drosophila enhancers. These studies reveal a diversity of modular enhancers that most likely regulate pdm gene expression during embryonic and adult development, highlighting a high level of temporal and spatial expression specificity. In addition, clusters of functionally related enhancers were discovered throughout the pdm locus. A subset of these enhancers share conserved elements including sequences that correspond to known TF DNA binding sites. Although comparative analysis of the nubbin and pdm-2 encoding sequences indicate that these two genes most likely arose from a duplication event, only partial evidence of sequence duplication between their enhancers was found, suggesting that after the putative duplication their cis-regulatory DNA diverged at a higher rate than their coding sequences (Ross, 2015).

This study found 41 enhancers that directed embryonic expression, an overlapping set of 46 activated larval expression, and another overlapping set of 46 activated expression in the adult CNS. While many of these enhancers were activated only in the nervous system, a subset activated reporter gene expression outside of the nervous system, including in larval appendages and in the trachea. Roughly a third of the tested CSCs did not exhibit any detectable cis-regulatory activity in the nervous system. Since this study focused on identifying neural enhancers, the possibility exists that some or all of these CSCs that lack neural system activity may regulated gene expression in the larval and adult tissues that were not examined (Ross, 2015).

There are other online resources of documented enhancers in the Drosophila genome, namely, FlyLight and Vienna Tiles. While these cis-regulatory libraries provide useful information, the coverage of the pdm locus in these databases is not complete. For example, FlyLight analysis did not detect 14 enhancers that flank the nub transcribed sequence. These include those located upstream to the nub long transcript (nub-12 and nub-13), its first intron (nub-28), second exon (nub-32a), second intron (nub-32b, nub-32c, nub-33, nub-36, nub-40b, nub-41, nub-42, nub-44, and nub-45a), and third intron (nub-49b). The FlyLight library also does not include seven pdm-2 enhancers: located in the upstream region (pdm2-21); within the second intron (pdm2-27 and pdm2-28) and lacks information regarding its downstream region (pdm2-45, pdm2-46, pdm2-47 and pdm2-48). Vienna Tiles also provides only partial coverage of the pdm locus, omitting the following 11 pdm locus enhancers: nub-58a, nub-58b, pdm2-13, pdm2-17, pdm2-21, pdm2-22, pdm2-23a, pdm2-31b, pdm2-32, pdm-33, and pdm2-48 . While the Vienna Tiles database provides information on embryonic and adult enhancers, it does not supply information on cis-regulatory activity during larval development. In addition, based on the current analysis, most of the reporter transgenes in these two libraries contain multiple enhancers. For example, he Vienna Tiles enhancer denoted as VT6436 enhancer is made up of two embryonic enhancers (nub-28 and nub-29) (Ross, 2015).

Analysis of the pdm locus enhancers identified four functionally related enhancers (nub-46, nub-49b, pdm2-34, and pdm2-37a) that activated expression during NB lineage development. The nub-46 and pdm2-34 enhancers are both located in the third intron of the nub and pdm-2 long transcript, respectively, whereas nub-49b and pdm2-37a are positioned immediately 5' to the transcriptional start site of their respective short isoform. While the nub-46 and pdm2-34 enhancers drove overlapping but nonidentical expression during embryonic and larval NB lineage development, nub-49b and pdm2-37a regulated similar expression patterns during postembryonic NB lineage development. Analysis of nub-46 and pdm2-34 revealed that these enhancers share multiple conserved DNA elements, albeit in largely unique configurations. Although these observations suggest these enhancers are related, additional studies are needed to further resolve subtle differences between their regulatory activities (Ross, 2015).

Comparative analysis of the nub and pdm-2 coding sequences revealed that their sequence relationship was mostly limited to the exons that encode their POU domains and homeodomains. In contrast, no evidence of collinearity was detected within their noncoding regions, suggesting that they have diverged at a faster rate than the coding sequences. Only one pdm ortholog was found in the mosquito, whereas the medfly and housefly carry both genes. Given this observation and accounting for the divergence of Drosophila from these distant Diptera, the pdm duplication event may have occurred in the Dipteran line between 100 and 260 million (Ross, 2015).

Given the presence of the pdm genes in the medfly and housefly genomes, it was asked whether some or all of the Drosophila CSCs could also be identified in these distant species. Submitting the D. melanogaster genomic sequences surrounding nub and pdm-2 to BLAST searches using the medfly and housefly genomes revealed sequences conserved in the three Dipteran species within several pdm locus CSCs (see Three-way alignment of ultraconserved sequences in conserved sequence clusters identified in Drosophila, housefly, and medfly) that were typically found within their longest conserved sequence blocks (CSBs). For example, a 48 bp sequence within the pdm2-26 CSC that is conserved in all drosophilids, in addition to the medfly and housefly (see The pdm2-26 enhancer contains ultraconserved sequences detected in multiple Diptera)(Ross, 2015).

These studies revealed that two-thirds of the CSCs function as cis-regulatory enhancers that regulate gene expression in a diverse array of spatiotemporal aspects, which taken together reflect pdm expression domains. These observations suggest that the pdm genes are dynamically regulated by multiple cis-regulatory modules, and that these enhancers are more amenable to evolutionary restructuring than their protein encoding exons. This is in agreement with recent reviews on the evolution of Dipteran enhancers highlighting the flexibility of enhancers to maintain their function after loss and/or gain of TF DNA binding sites. Also consistent with these observations, functionally related enhancers were found within the pdm locus that share conserved sequences, albeit in different arrangements and orientations (Ross, 2015).

From a mechanistic perspective, these observations suggest that enhancer behavior can be predicted based on the combination of the conserved elements shared among functionally related enhancers. Similar observations have been made by others. Hierarchical clustering analysis of shared conserved sequences revealed that pdm SOG enhancers may be grouped based on shared elements that are for the most part not present within other pdm locus CSCs. A similar analysis of adult median neurosecretory cell (mNSC) enhancers revealed that they grouped together, as evidenced by sharing of conserved sequence elements, which were largely absent in non-mNSC CSCs with the pdm locus. While further work is required to determine whether these shared elements are important for enhancer activity, these findings suggest a level of structural complexity in the presence and clustering of enhancers that requires further analysis. To construct a better representation of enhancer structure and thus cis-regulatory prediction, one would ideally prefer to use a larger training set of enhancers to improve the accuracy of prediction. These approaches will be addressed in future studies (Ross, 2015).

One of the principal findings of this study is the discovery of 77 enhancers that exhibit a remarkably diverse range of cis-regulatory activities during embryonic and postembryonic development. The biological significance of this enhancer diversity most likely reflects the diversity of the developmental programs in which these transcription factors participate. Functionally related enhancers that share multiple conserved DNA sequences were also identified, and these enhancers could be classified using hierarchical clustering techniques. In addition, this analysis has revealed that the collinearity between the pdm genes is predominantly confined to their POU domain and homeodomain exons, suggesting that their noncoding sequences are diverging at a faster rate than their coding sequences. These results should provide further insight into the regulatory logic that controls cis-regulatory function and thus gene regulation (Ross, 2015).

TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-dependent versus TATA-dependent transcription

The RNA polymerase II core promoter is a structurally and functionally diverse transcriptional module. RNAi depletion and overexpression experiments revealed a genetic circuit that controls the balance of transcription from two core promoter motifs, the TATA box and the downstream core promoter element (DPE). In this circuit, TBP activates TATA-dependent transcription and represses DPE-dependent transcription, whereas Mot1 and NC2 block TBP function and thus repress TATA-dependent transcription and activate DPE-dependent transcription. This regulatory circuit is likely to be one means by which biological networks can transmit transcriptional signals, such as those from DPE-specific and TATA-specific enhancers, via distinct pathways (Hsu, 2008).

The RNA polymerase II core promoter comprises the sequences that direct the initiation of transcription. Although it has often been presumed that the core promoter is a generic entity, current evidence indicates that there is considerable diversity in core promoter structure and function. Hence, the core promoter is a regulatory element (Hsu, 2008 and references therein).

This study focuses on the relation between two core promoter motifs: the downstream core promoter element (DPE) and the TATA box. The TATA box is the most ancient core promoter motif, as it is conserved from archaebacteria to humans. It has a consensus of TATAWAAR, where the upstream T nucleotide is typically located about -31 or -30 relative to the A + 1 in the Initiator (Inr) element. The DPE appears to be conserved among metazoans. It is strictly located from +28 to +33 relative to the A + 1 in the Inr, and has a consensus of RGWYVT in Drosophila (Hsu, 2008).

Both the TATA box and DPE are binding sites for the TFIID basal transcription factor, but TFIID appears to have distinct modes of binding to the two core promoter motifs. The TBP subunit of TFIID binds to the TATA box, whereas the TAF6 and TAF9 subunits of TFIID are in close proximity to the DPE. In addition, the DNase I footprinting patterns on TATA-containing versus DPE-containing promoters are different. In particular, TFIID footprints of DPE-dependent core promoters exhibit a periodic 10-bp DNase I digestion pattern that suggests an extended, close interaction of TFIID from the Inr through the DPE (Hsu, 2008 and references therein).

There are differences in the functional properties of DPE-dependent versus TATA-dependent core promoters. For instance, an enhancer-trapping analysis in Drosophila revealed the existence of DPE-specific as well as TATA-specific transcriptional enhancers. It was also found that a set of factors (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNA polymerase II, PC4, and Sp1) that is sufficient for transcription of promoters containing both TATA and DCE (downstream core element) motifs is not able to transcribe a DPE-dependent promoter. In that case, DPE-dependent transcription was additionally found to require casein kinase II (CKII) and Mediator. In other studies, NC2 (also known as Dr1-Drap1), which was originally identified as a repressor of TATA-dependent transcription, was found to activate transcription from five different DPE-dependent core promoters in reactions performed with a nuclear extract. With a purified transcription system, however, NC2 activation of a DPE-dependent core promoter was not observed (Hsu, 2008).

To determine the nature of the factors that promote DPE-dependent versus TATA-dependent transcription, the properties of key transcription factors was investigated by RNAi depletion, overexpression, and chromatin immunoprecipitation (ChIP) analyses with multiple DPE-dependent and TATA-dependent promoters. The new findings reveal a regulatory circuit that controls the balance between DPE-dependent versus TATA-dependent transcription (Hsu, 2008).

This study used cultured Drosophila cells as the experimental system to investigate DPE versus TATA function. Two sets of reporter constructs were created that contain either TATA or DPE motifs driving a luciferase reporter gene. The DPE-dependent and TATA-dependent promoters in each set were identical, except for the sequences at the positions of the DPE and TATA motifs, and had comparable transcriptional activities (Hsu, 2008).

The effects of several transcription factors were investigated upon DPE versus TATA transcription by RNAi depletion analysis. The transcription factors were selected on the basis of their fundamental importance as well as their potential role in DPE-dependent transcription. First RNAi depletion of each target factor was carried out, and then one-half of the cells was transfected with the DPE-dependent reporter construct and the other half of the cells with the TATA-dependent reporter. The resulting transcription levels were assessed by measurement of the luciferase activities relative to those in mock RNAi controls (Hsu, 2008).

Depletion of TBP sharply decreases TATA-dependent transcription, but has little effect on DPE-dependent transcription. This effect was observed with a distinct and independent set of DPE-dependent and TATA-dependent reporter constructs as well as with a different nonoverlapping dsRNA probe for TBP. Consistent with the ability of TFIIA to promote TBP binding to DNA, depletion of TFIIA reduces TATA transcription more than DPE transcription with two different sets of reporter constructs. In contrast, no differential DPE versus TATA effects were seen upon RNAi depletion of TAF4 (which is essential for the structural integrity of TFIID), TFIIB, CKIIα, a PC4-like protein, subunits of Mediator (Med17, Med24), or subunits of the SAGA/TFTC complex (Gcn5, Spt3, Ada2b) (Hsu, 2008).

Thus, these findings indicate that TBP and, to a lesser extent, TFIIA have a key role in discriminating between DPE- versus TATA-dependent transcription. The stronger effect of TBP relative to TFIIA is consistent with an auxiliary function of TFIIA, such as its ability to increase the binding of TBP to the TATA box. Because depletion of TBP did not adversely affect DPE-dependent transcription, the possibility was considered that DPE-dependent transcription might involve a factor, such as SAGA/TFTC, that lacks TBP. Therefore the effect of depletion of three SAGA/TFTC subunits (Gcn5, Spt3, and Ada2b) was tested, but no substantial decrease was seen in DPE-dependent transcription or any differential DPE versus TATA effects. Thus, it appears unlikely that SAGA/TFTC is important for DPE-dependent transcription. Lastly, upon depletion of CKII, Mediator, PC4-like, TAF4, and TFIIB, a decrease was observed in both DPE-dependent and TATA-dependent transcription. These results are consistent with a more general transcriptional function rather than a DPE-specific or TATA-specific activity for these factors (Hsu, 2008).

NC2 has been previously found to be a DPE-specific transcriptional activator. With a different biochemical system, however, NC2-mediated enhancement of DPE transcription was not observed. Therefore attempts were made to clarify these apparently contrasting results by RNAi analysis of NC2 with DPE versus TATA reporter gene systems. NC2 comprises two subunits, NC2α (Drap1) and NC2β (Dr1). Upon RNAi depletion of either NC2α or NC2β, a more substantial decrease was seen in DPE- relative to TATA-dependent transcription with two different sets of reporter genes as well as with two different dsRNAs. These results therefore indicate that NC2 promotes DPE-dependent transcription relative to TATA-dependent transcription in cultured cells (Hsu, 2008).

Next, the effects were tested of Mot1 (also known as BTAF1 and Hel89B) on DPE versus TATA transcription. Like NC2, Mot1 antagonizes TBP function. NC2 represses TATA-dependent transcription by blocking the association of TBP with other factors such as TFIIA and TFIIB. Mot1 is an ATPase that removes TBP from DNA by an ATP-dependent mechanism. Genetic studies in Saccharomyces cerevisiae suggest that NC2 and Mot1 have related functions. NC2 and Mot1 bind to overlapping regions in the yeast genome and form a complex with TBP and DNA. In addition, although NC2 and Mot1 are often thought to be repressive, a positive function for these factors has been observed in vitro and in vivo (Hsu, 2008 and references therein).

It was observed that RNAi depletion of Mot1 has a stronger detrimental effect on DPE-dependent than TATA-dependent transcription. This effect was seen with two different sets of reporter genes as well as with two independent nonoverlapping dsRNA fragments. Thus, like NC2, Mot1 promotes DPE- relative to TATA-dependent transcription (Hsu, 2008).

To investigate the relationship between TBP, NC2, and Mot1 in the regulation of core promoter activity, different combinations of these factors were codepleted and the resulting effects upon DPE versus TATA transcription were determined. Codepletion of both NC2α and Mot1 preferentially decreases DPE relative to TATA transcription to an extent that is similar to that seen upon depletion of either NC2α or Mot1 alone. These results suggest that NC2 and Mot1 promote DPE-dependent transcription via the same pathway. In contrast, when TBP + Mot1 or TBP + NC2α were codepleted, nearly the same effect on DPE versus TATA transcription was seen as that seen upon depletion of TBP alone. These findings suggest that TBP is downstream from NC2 and Mot1 in the pathway that regulates DPE versus TATA transcription. Thus, NC2 and Mot1 appear to modulate DPE versus TATA transcription by acting via TBP (Hsu, 2008).

To complement the RNAi depletion studies, the effects of overexpression of TBP, Mot1, or NC2 was investigated in S2 cells. In these experiments, TBP, Mot1, or NC2 expression vectors were cotransfected along with the DPE-dependent or TATA-dependent reporter constructs. Overexpression of TBP increases TATA-dependent transcription and decreases DPE-dependent transcription. Conversely, overexpression of Mot1 increases DPE-dependent transcription and decreases TATA-dependent transcription. Overexpression of both subunits of NC2 decreases TATA-dependent transcription, but has little effect on DPE-dependent transcription. Consistent with the two NC2 subunits functioning together in a complex, overexpression of NC2α alone or NC2β alone has no effect on DPE-dependent or TATA-dependent transcription. In addition, a parallel set of overexpression experiments was carried out with TBP, Mot1, and NC2 with a different set of DPE-dependent and TATA-dependent reporter genes, and nearly identical results were obtained. These findings further demonstrate that TBP favors TATA relative to DPE transcription, whereas Mot1 and NC2 favor DPE relative to TATA transcription (Hsu, 2008).

To examine the functions of TBP, Mot1, and NC2 in a more natural context, the effects of RNAi depletion of TBP, Mot1, or NC2 upon transcription of endogenous DPE- or TATA-containing genes was tested in Drosophila Kc cells. In these experiments, secondary/late ecdysone-responsive genes, that are activated upon ecdysone induction, were employed. In this manner, it was possible to characterize the requirements for TBP, Mot1, and NC2 for transcriptional activation (Hsu, 2008).

Many genes in Drosophila are activated by the steroid hormone 20-hydroxyecdysone (20HE). A list of genes was obtained that was induced by 20HE in Drosophila Kc cells. From this list, secondary/late-response genes were identified with DPE+Inr motifs (CG9511, CG16876, Glut1) or TATA + Inr motifs (Obp99c, CG4500) in their core promoters. The 20HE induction of these genes in Kc cells was confirmed by using real-time RT-PCR. In addition, the transcription start sites of each of these genes was verified by primer extension analysis of mRNA isolated from Kc cells (Hsu, 2008).

The RNAi analysis of the endogenous secondary/late-response genes was carried out as follows: TBP, TAF4, NC2α, and Mot1 were each individually depleted by RNAi in Kc cells for 4 d, and then the ecdysone-responsive genes were induced with 20HE for 24 h. The total RNA was isolated, and the transcript levels of the selected genes were determined by real-time RT-PCR. It was observed that depletion of TBP decreases transcription of the TATA-containing promoters and increases transcription of the DPE-containing promoters. Thus, these results suggest not only that TBP activates TATA-dependent promoters, but also that it represses DPE-dependent promoters. Conversely, it was found that depletion of Mot1 or NC2α decreases transcription of DPE-containing promoters and increases transcription of TATA-containing promoters. These findings suggest a positive function of Mot1 and NC2 at DPE-dependent promoters and a negative function at TATA-containing promoters. RNAi depletion of TAF4 causes a substantial decrease in transcription from both DPE-containing and TATA-containing promoters. These results further support the conclusion that TAF4 is required for both DPE-dependent and TATA-dependent transcription (Hsu, 2008).

The RNAi depletion analysis with the endogenous genes leads to nearly the same conclusions as the experiments with the transfected luciferase reporter genes. Both sets of experiments indicate that TBP favors TATA-dependent relative to DPE-dependent transcription, and that Mot1 and NC2 favor DPE-dependent relative to TATA-dependent transcription. However, it is useful to note the two distinctions. First, TBP depletion results in an increase in transcription from endogenous DPE-containing genes, but does not alter transcription from transfected DPE-dependent reporter genes. Second, depletion of Mot1 or NC2α causes an increase in transcription from endogenous TATA-containing genes, but results in a slight decrease in transcription from transfected TATA-dependent reporter genes. The analysis of the endogenous genes is likely to provide a more accurate representation of TBP, Mot1, and NC2 activity than the studies with the transfected genes, because the endogenous genes are in their natural context at the normal copy number and the experiments with the endogenous genes do not involve the extra transfection procedure. Thus, the findings from the analysis of the endogenous genes suggest a repressive function of TBP at DPE-dependent promoters as well as a repressive function of Mot1 and NC2 at TATA-dependent promoters (Hsu, 2008).

The secondary/late ecdysone-responsive genes were further characterized by ChIP analysis with TBP and RNA polymerase II (Rpb3 subunit), for which ChIP-quality antibodies were available. With the TATA-containing CG4500 promoter, there is increased ChIP signal for both TBP and Rpb3 in the promoter region upon 20HE induction. In the control/reference TATA-containing hsp70 promoter, an increase in ChIP of TBP and Rpb3 was also observed in the promoter region. By comparison, with the DPE-containing Glut1 and CG16876 promoters, there is increased ChIP of Rpb3 in the promoter region upon 20HE induction; however, the ChIP signal for TBP does not increase under the same conditions. The absence of an increased ChIP signal for TBP with the DPE-containing promoters does not necessarily indicate that TBP is not present at the promoter; for instance, it is possible that TBP may be in an altered configuration that masks the accessibility of the antibodies. Yet, whether or not TBP is in close proximity to the DPE-containing promoters, these results show that there are differences in the nature of the interaction of TBP with TATA-containing versus DPE-containing promoters (Hsu, 2008).

It is also relevant to note that secondary/late-response genes were chosen in these studies, because secondary/late genes are more likely than primary/early-response genes to be in a naïve state prior to ecdysone induction. To test this notion, RNAi depletion analyses was carried out with two primary/early-response genes, E74A and E75B, both of which contain DPE motifs. With these genes, no change was observed in transcription upon RNAi depletion of TBP, TAF4, Mot1, or NC2α. Moreover, ChIP analysis further revealed that both TBP and RNA polymerase II (Rpb3 subunit) are present at the promoters prior to ecdysone induction. Therefore, it appears likely that these primary/early-response genes exist in a preactivated state that does not require the subsequent action of factors such as TFIID, Mot1, or NC2 (Hsu, 2008).

The RNAi depletion and overexpression data reveal a regulatory circuit with the following properties: TBP activates TATA-dependent transcription and represses DPE-transcription; then, Mot1 and NC2 act to block both the activating and repressive functions of TBP. In this model, there are opposing forces that alter the balance between DPE versus TATA transcription. A decrease in TBP or an increase in Mot1/NC2 favors DPE transcription, whereas an increase in TBP or a decrease in Mot1/NC2 favors TATA transcription. Importantly, the functions of Mot1 and NC2 are dependent on TBP. In addition, the proposed circuit is consistent with the known antagonistic relationship between TBP and NC2 as well as between TBP and Mot1 (Hsu, 2008).

How might TBP repress DPE-dependent transcription? Two possible explanations are suggested. (1) In the absence of a TATA box, TBP might interfere with the proper assembly of the transcription initiation complex. (2) There may be an essential DPE-directed transcription factor that is inhibited by TBP. It is possible that DPE-mediated transcription does not directly involve TBP; there is substantial evidence of RNA polymerase II-mediated transcription occurring in the absence of TBP (Hsu, 2008 and references therein).

It was also considered whether either of the TBP-related factors, TRF1 and TRF2, are used instead of TBP at DPE-containing promoters. To this end, the effect of depleting TRF1 or TRF2 was examined upon the expression of DPE-containing versus TATA-containing endogenous genes. TRF1, which is largely involved in RNA polymerase III transcription in Drosophila, has little or no effect on transcription of DPE-containing or TATA-containing genes. TRF2 is important for both DPE-mediated and TATA-mediated transcription. The effect of TRF2 is similar to that of TAF4, which appears to contribute to both DPE-depentend and TATA-dependent transcription. Neither TRF1 nor TRF2 exhibit an opposite effect on DPE-mediated versus TATA-mediated transcription as do TBP, Mot1, and NC2. In addition, a genome-wide ChIP analysis of TRF2 did not reveal an association of TRF2 with DPE-containing genes. Thus, at the present time, there is no evidence suggesting a specific link between either TRF1 or TRF2 and DPE-mdidated or TATA-mediated transcription (Hsu, 2008).

In conclusion, the analysis of TBP, Mot1, and NC2 in the context of DPE-containing versus TATA-containing promoters has revealed a regulatory circuit that controls the balance between DPE-mediated versus TATA-mediated transcription. This circuit may be a key means by which DPE or TATA specificity of transcriptional enhancers is achieved. In the future, it will be interesting and important to build upon this core circuit to identify the connections and mechanisms by which biological networks use DPE and TATA specificity to increase the number of pathways by which signals can be transmitted (Hsu, 2008).

The same transcriptional activator (MTF-1) requires different coactivator subunits depending on the context of the core promoter

Cells often fine-tune gene expression at the level of transcription to generate the appropriate response to a given environmental or developmental stimulus. Both positive and negative influences on gene expression must be balanced to produce the correct level of mRNA synthesis. To this end, the cell uses several classes of regulatory coactivator complexes including two central players, TFIID and Mediator (MED), in potentiating activated transcription. Both of these complexes integrate activator signals and convey them to the basal apparatus. Interestingly, many promoters require both regulatory complexes, although at first glance they may seem to be redundant. RNA interference (RNAi) was used in Drosophila cells to selectively deplete subunits of the MED and TFIID complexes to dissect the contribution of each of these complexes in modulating activated transcription. The robust response of the metallothionein genes to heavy metal was used as a model for transcriptional activation by analyzing direct factor recruitment in both heterogeneous cell populations and at the single-cell level. Intriguingly, it was found that MED and TFIID interact functionally to modulate transcriptional response to metal. The metal response element-binding transcription factor-1 (MTF-1) recruits TFIID, which then binds promoter DNA, setting up a 'checkpoint complex' for the initiation of transcription that is subsequently activated upon recruitment of the MED complex. The appropriate expression level of the endogenous metallothionein genes is achieved only when the activities of these two coactivators are balanced. Surprisingly, it was found that the same activator (MTF-1) requires different coactivator subunits depending on the context of the core promoter. Finally, the stability of multi-subunit coactivator complexes can be compromised by loss of a single subunit, underscoring the potential for combinatorial control of transcription activation (Marr, 2006).

There are four known metallothionein genes in Drosophila: MtnA, MtnB, MtnC, and MtnD. Of these, the best characterized is the MtnA gene, which produces a transcript of ~600 bases in length, bearing one intron. All of the regulatory elements required for robust response to heavy metals, including copper, lie within 500 bp of the transcription start site. The gene is controlled by a single activator, metal response element-binding transcription factor 1 (MTF-1), which binds two adjacent metal response elements (MRE) 50 bp upstream of the TATA-box (Zhang, 2001). Quantitative PCR (qPCR) analysis of the endogenous gene in Drosophila S2 cells shows that the gene is highly induced (~250-fold) after a short exposure to copper. The total amount of stable MtnA mRNA approximates the level of the abundant transcript for the ribosomal subunit Rp49. Primer extension analysis confirms that transcriptional activation of the endogenous MtnA gene originates from a unique start site overlapping the core promoter. The transcript accumulates linearly for ~12 h, thus measurements in this time window likely reflect relative levels of transcription of the MtnA gene. Importantly, induction at the endogenous chromosomal locus is easily assayed in order to measure physiologically relevant transcriptional activation in the context of native chromatin. Taken together, these properties establish the endogenous MtnA gene as a useful model for studying transcriptional mechanisms governing an inducible gene (Marr, 2006).

Using chromatin immunoprecipitation (ChIP), it was found that the sequence-specific DNA-binding protein MTF-1 is specifically recruited to the MtnA promoter region in response to copper. Curiously, the ChIP of the promoter region was compared to a region 1 kb downstream, a significant amount of MTF-1 was found to be present on the promoter even in the absence of added copper. Under these conditions, little transcription is detected from this gene. As a preliminary experiment to investigate a potential functional interaction between TFIID and MED, it was first asked whether the two complexes are both recruited in a signal-dependent manner to the MtnA gene. Using ChIP, it was found that both TBP and the TAFs are efficiently recruited to the promoter region in response to copper. In addition, the MED17, MED24, MED26, and MED27 subunits of MED are all recruited to the promoter region in response to copper treatment. Consistent with the high level of induction, RNAPII occupancy at the MtnA promoter is also increased in response to heavy metal treatment. Thus, both core coactivator complexes and RNAPII are efficiently recruited to the promoter region upon induction and resultant binding of MTF-1 to the MREs (Marr, 2006).

Because the ChIP assay is limited to measuring response in a heterogeneous population of cells, a transgenic model system was extablished in Drosophila S2 cells in order to visualize the response at the single-cell level. Such an approach has proved useful in understanding transcription factor dynamics in vivo. By selecting for stably transfected MtnA firefly luciferase reporters, a concatenated transgenic locus was generated in a clonal line of S2 cells. The transgenic locus was assayed for dependence on copper using a luciferase assay. Importantly, transcription initiates a unique site that maps to the correct start site of the MtnA core promoter. With this substantial increase in gene number (~2000) at the integrated transgenic locus, it should now be possible to visualize direct recruitment of specific transcription factors to the MtnA promoter within a single cell (Marr, 2006).

As expected, in the absence of heavy metal, MTF-1 is predominantly cytoplasmic; however, in agreement with ChIP data, some MTF-1 can be detected at the transgenic cluster even in the absence of a metal stimulus. Thus, antibody labeling of MTF-1 provides a useful marker for the subnuclear location of the transgene cluster in both induced and uninduced cells. Notably, the locus is not undergoing transcription (as detected by RNA FISH) in the absence of heavy metal induction despite the presence of some MTF-1 at the transgene cluster. Upon copper induction, MTF-1 vacates the cytoplasm and accumulates selectively at the transgenic locus. Under these same conditions, TBP is also actively recruited to this cluster. Consistent with not only TBP but holo-TFIID complex recruitment, it was found that TAF2 also accumulates at the transgene. Likewise MED components recruited to the transgene were detected using antibodies against MED26. As expected, RNAPII is recruited to the cluster in a copper-dependent manner consistent with the transcriptional induction of the transgene under these conditions. In contrast, TBP-related factor 1 (TRF1), a subunit known to be a key component of the RNA polymerase III core promoter recognition complex, is not recruited to the transgene. This negative control helps rule out the possibility that the tandemly reiterated transgene is simply nonspecifically attracting transcription factors (Marr, 2006).

Having established by two independent methods that both TFIID and MED complexes are recruited to the MtnA promoter in an activator-dependent manner, their role in potentiating transcriptional activation of the endogenous MtnA gene was investigated. The efficient technique of RNAi in Drosophila S2 cells was used to knock down expression of TFIID and MED subunits. In addition, the activator MTF-1 was knocked down to ascertain the extent of the activator’s role in induction. After treatment with copper, total RNA was purified from dsRNA treated and untreated S2 cells and then they were assayed by two independent methods. First, a primer extension analysis was used on equivalent amounts of total RNA. This assay revealed that an accurate transcription is detected from one distinct core promoter start site. Next, qPCR normalized to the Rp49 mRNA was used, to confirm that there is little or no global disturbance of RNAPII transcription (Marr, 2006).

Not surprisingly, depletion of MTF-1 severely reduced transcriptional activation from the MtnA promoter, confirming the central role of this activator. RNAi directed against TBP also had a dramatic inhibitory effect. The MtnA promoter is <10% as active when TBP levels are severely depleted. Surprisingly, knockdown of multiple TAFs had little apparent effect on the ability of MTF-1 to activate MtnA. Indeed, depletion of the TAFs actually stimulated (1.5- to 2-fold) production of RNA. With the exception of TAF11, a reduction of individual TAFs resulted in a remarkably uniform response. The reason for this uniformity became apparent when the stability of the TFIID complex was examined in the RNAi-treated cells. The overall stability of the holo-TFIID complex appears to be coupled to the stability of certain individual TAFs. In the most dramatic example, RNAi-targeted reduction of TAF4 leads to the concomitant loss of TAF1, TAF5, TAF6, and TAF9, as well as a detectable reduction in TBP. Interestingly, TAF2 and TAF11 are largely unaffected by depletion of TAF4. Similar results are observed for the other TAFs as well. When the transcript levels of the TAFs were measure after RNAi treatment, it is clear that the loss of stability occurs at the protein level, since the transcript levels for nontargeted TAFs are unaffected. For example, when TAF4 is targeted, only the TAF4 transcript is depleted (Marr, 2006).

In contrast to the TAFs, RNAi reduction of MED subunits gave striking but variable effects on the ability of MTF-1 to activate transcription from the MtnA promoter. Unlike TFIID, the response is far from uniform. For example, dsRNA directed against MED23 has little effect on induction of MtnA, while loss of MED17, the Drosophila SRB4 homolog, has a strong inhibitory effect. The lack of a uniform response in the MED RNAi led to a further investigation of the potential differential response upon depletion of MED subunits at related promoters activated by MTF-1. As discussed above, Drosophila has four metallothionein genes that respond to heavy metals. Three of these—MtnA, MtnB, and MtnD—are active in S2 cells. All three of these genes are specifically activated by the same factor, MTF-1. All three Mtn genes were examined in a single experiment using qPCR. First, it was confirmed that all three promoters, MtnA, MtnB, and MtnD, require MTF-1 for induction. Remarkably, distinct differential requirements were found for MED subunits depending on the promoter. For example, loss of MED13, a subunit of the larger MED complex (ARC-L) thought to play a repressive role in transcription, is not essential for MtnA induction. In contrast, MED13 was found to be important for both MtnB and MtnD activation by MTF-1. In contrast, the opposite specificity was seen with the MED26 subunit, a component of the smaller MED complex (CRSP), thought to play predominantly a coactivator role in transcription. Interestingly, MED26 is required for full induction of the MtnA promoter but is dispensable for MTF-1 activation of the MtnB and MtnD promoters. Thus, these experiments reveal a remarkable example of differential dependence on cofactor composition even though all three promoters tested use the same activator. Apparently, the precise role of individual MED subunits depends on the promoter context and structure, despite the absence of any evidence of direct binding of DNA by the MED complex (Marr, 2006).

To help rule out nonspecific effects on transcription such as a change in the concentration of free RNA polymerase, representative targets from TFIID and MED were tested in a transient transfection assay where the effect to a second promoter can be normalized. In these experiments, TAF4 and MED17 were chosed as representative targets, since TAF4 compromises much of the TFIID complex and MED 17 is likely a component of the core MED complex. The transient transfection data are largely consistent with the data generated at the endogenous locus and at the transgene (Marr, 2006).

The data presented above suggest that activation of the MtnA gene requires specific MED subunits, and at the same time the TAFs appear to be playing a potential negative regulatory role. Because it is clear that the TAFs are specifically recruited in S2 cells to the MtnA promoter in a copper-dependent manner by MTF-1, whether TFIID recruitment can occur in the absence of the MED complex was examined. To achieve this, RNAi directed against MED17 was used, which results in an almost complete loss of MED activity. Surprisingly, TFIID is still efficiently recruited to the MtnA gene. ChIP experiments confirmed that TBP and TAF2 are still actively (and likely directly) recruited to the endogenous MtnA gene by MTF-1 even when the gene is transcriptionally inactive as measured by qPCR analysis. The MtnA luciferase transgene system was used to investigate this relationship at the single-cell level. Without any RNAi, TBP, TAF2, and RNAPII were all recruited to the transgene. In agreement with the ChIP data above, even in the absence of MED activity, after MED17 depletion, TBP and TAF2 are nevertheless efficiently recruited to the transgene. In contrast, no RNAPII can be detected at the transgene consistent with the loss of transcription activation. Apparently, TFIID is recruited to the promoter, but the promoter is not active in supporting transcription. Importantly, recruitment of this 'inactive TFIID' is dependent on the activator MTF-1. In the absence of MTF-1, no TFIID or RNAPII is recruited to the transgene (Marr, 2006).

This perplexing result of recruiting an apparently 'inactive' TFIID prompted an examination of what happens when both TAFs and MEDs were depleted. Remarkably when both the TAFs and MED complex are depleted and 'removed' from the MtnA promoter, MTF-1-dependent activation of transcription is restored to ~95% the level of untreated cells, which is well above the inhibited level observed when the MEDs alone are depleted. In humans and Drosophila, TAFs can be subunits of other complexes such as TFTC and STAGA, so it is possible that the functional interaction analyzed is not TFIID-specific. To test this, specific subunits of these other complexes were targeted to determine if they would have a similar ability to rescue the MED knockdown. Unlike the TFIID subunits, RNAi against dAda2b, dGCN5, dSPT3, and dTRA1 was unable to rescue the loss of the MED subunits. These findings taken together suggest that most likely the functional relationship revealed by these experiments with the MtnA promoter, indeed, involve some regulatory transaction between TFIID and MED (Marr, 2006).

The requirement for coactivator complexes mediating transcriptional responses to activators has been well documented. However, by using an inducible Drosophila gene as a model system, a previously unknown functional interaction has been uncovered between two coactivator complexes, TFIID and MED. In the absence of TAFs, the cell responds inappropriately to a metal stimulus. The cell synthesizes 50%–200% more mRNA from the MtnA gene than it does in the presence of the TAFs. The data suggest that at this gene, TFIID is recruited in an inactive state, a state that impedes initiation of transcription. It is believed that this sets up a checkpoint early in the initiation process to meter the RNA synthesis. The MED complex must be recruited to get past this checkpoint. It is postulated that the MED complex likely modifies TFIID, converting it to an active state. This could be accomplished either through one of the known enzymatic activities of MED, phosphorylating (cdk8) or ubiquitylating (MED8) TFIID subunits, or through some, as yet undetected, chaperone-like function that remodels TFIID into an active conformation. Not surprisingly then, in the absence of MED subunits the cell cannot mount an appropriate response to environmental signals. In fact, depletions of many of the MED subunits lead to <20% of the normal amount of mRNA. Unlike the uniform response to depletion of TAFs, the response to depletion of MEDs is much less uniform. One possibility is that the MED complex is more functionally and structurally diverse than TFIID. Indeed, alternative subcomplexes of MED have been purified biochemically, whereas no such subcomplexes of TFIID have been reported (Marr, 2006).

By analysis of three different Mtn genes, all of which are dependent on the same single activator, it was found, surprisingly, that there is a differential requirement of specific MED subunits at the three Mtn promoters. This is taken as evidence that, depending on the precise arrangement of cis elements and promoter context, the same activator can require different mediator subunits or modules to transmit its signals to the basal apparatus (Marr, 2006).

Interestingly, the kinase module of the MED complex, previously linked with repression functions, is required for efficient activation at two of the promoters. This result, combined with the finding that at the MtnA promoter the TAFs have a repressive regulatory influence on transcription initiation, underscores the difficulty in assigning black and white functions to the coactivator complexes. It is likely that both TFIID and MED interpret multiple inputs from cellular signals and act either positively or negatively depending on the signals received as well as the specific promoter context. As such, the complexes may better be viewed as coregulators since they can play either a positive or negative role in the process of modulating gene expression. For example, only when both TFIID and MED are intact do Drosophila S2 cells produce the appropriate amounts of MtnA mRNA. In contrast, when either coactivator complex is disrupted, aberrant levels of transcription are seen. However, when both coactivator complexes are depleted, a significant level of metal inducible activation is actually restored. Presumably, in this 'stripped down' system, some portion of the remaining TBP pool can mediate transcription. Curiously, in the absence of TAFs but with a full complement of MEDs, there is also an aberrant level of transcription consistent with the notion that there is some finely tuned codependence between the TBP/TAF complex and the MED complex at this promoter (Marr, 2006).

The results also reinforce the notion that the activator is the primary determinant of the transcriptional response. The MTF-1 depletion experiments were the most detrimental to mRNA induction. In the absence of MTF-1, there is no detectable activation of the Mtn genes. In contrast, there is some residual transcription of MtnA even when either the MEDs or TBP are largely depleted from the Drosophila cells. This remaining activity could be due to incomplete depletion, or it could indicate alternative mechanisms of activation that are activator-dependent but can partially bypass the requirement for the coregulator complexes (Marr, 2006).

In the course of testing the requirement for TAFs in activated transcription, the codependent stability of the TFIID complex was discovered. Particularly striking is the finding that TAF4 depletion destabilizes most of the other TAFs and, to some extent, even TBP. Therefore, the TAF depletion experiments most likely reflect a loss of holo-TFIID rather than just the loss of individual subunits. It is worth noting that metazoan organisms contain multiple variants of TAF4: TAF4b in vertebrates and No-hitter in Drosophila. Both of these have been implicated in tissue-specific gene expression. It is conceivable that substitution of this keystone TAF can provide a mechanism to change the entire coregulator profile of TFIID (Marr, 2006).

One intriguing question this work raises is: Why would an activator recruit an inactive TFIID complex to the promoter? There are several previously described cases in which TFIID occupancy at a promoter does not strictly correlate with transcriptional activity. However, in most of these cases the genes being examined were either in a repressed or an unstimulated state. In contrast, the current studies were designed to specifically measure the role of coactivator complexes such as TFIID and MED in the context of an active gene MtnA upon metal stimulation. The ability to deplete MED activity under these conditions revealed the unexpected finding that although TFIID is dynamically recruited to the MtnA promoter, TFIID is mainly held in an 'inactive' state until the second cofactor complex, MED, is recruited. Perhaps this recruitment of an 'inactive' TFIID is a more common phenomenon that can only be detected in special circumstances and may represent a previously unappreciated control mechanism in transcription activation. If the activator first recruits TFIID, then subsequently recruits MED, and there is a requirement for additional factors to potentiate the secondary recruitment of coregulator assemblies, then this provides a potential checkpoint for fine-tuning the control of gene expression. Alternatively, since the cell invests a significant amount of energy in making a high level of transcript, requirement of continued stimulation (i.e., activator bound at the promoter) for mRNA production would provide the most economical use of resources (Marr, 2006).

Occupancy of the Drosophila hsp70 promoter by a subset of basal transcription factors diminishes upon transcriptional activation

The presence of general transcription factors and other coactivators at the Drosophila hsp70 gene promoter in vivo has been examined by polytene chromosome immunofluorescence and chromatin immunoprecipitation at endogenous heat-shock loci or at a hsp70 promoter-containing transgene. These studies indicate that the hsp70 promoter is already occupied by TATA-binding protein (TBP) and several TBP-associated factors (TAFs), TFIIB, TFIIF (RAP30), TFIIH (XPB), TBP-free/TAF-containg complex (GCN5 and TRRAP), and the Mediator complex subunit 13 before heat shock. After heat shock, there is a significant recruitment of the heat-shock transcription factor, RNA polymerase II, XPD, GCN5, TRRAP, or Mediator complex 13 to the hsp70 promoter. Surprisingly, upon heat shock, there is a marked diminution in the occupancy of TBP, six different TAFs, TFIIB, and TFIIF, whereas there is no change in the occupancy of these factors at ecdysone-induced loci under the same conditions. Hence, these findings reveal a distinct mechanism of transcriptional induction at the hsp70 promoters, and further indicate that the apparent promoter occupancy of the general transcriptional factors does not necessarily reflect the transcriptional state of a gene (Lebedeva, 2005; full text of article).

An inverse correlation was observed between factor occupancy and transcriptional activation. In the absence of heat shock, it was found that TBP, TAFs, TFIIB, TFIIF, TFIIH, TFTC, and Mediator are present at the hsp70 promoter region. These results are similar to previous observations in which the basal factors have been found to be present at transcriptionally inactive promoters. Surprisingly, however, the apparent occupancy of TBP, several TAFs, TFIIB, and TFIIF significantly decreases upon transcriptional activation. These results could be due to some of the following scenarios: (1) upon activation, the undetected factors are present but adopt a conformation that renders them refractory to polytene chromosome staining and to ChIP analysis; (2) the factors that are not detected are indeed absent and do not participate in the ongoing transcription of the genes; or (3) the factors are present only transiently at the actively transcribed promoter and thus exhibit lower average occupancy upon polytene chromosome staining and ChIP analysis (Lebedeva, 2005).

The first scenario requires that TBP, several TAFs, TFIIB, and TFIIF simultaneously become essentially invisible to polytene immunostaining as well as to ChIP analysis upon transcriptional activation of hsp70 and other heat-shock genes. The observed effects are not a consequence of the heat shock treatment, because these factors are observed at ecdysone-responsive genes that have been subjected to heat shock. Moreover, for several factors (TBP, TAF1, and TAF10), the immunostaining was repeated with two different polyclonal antibodies that were raised against different epitopes, and identical results were obtained after heat-shock treatment. Furthermore, histone H3 K14 acetylation was detected at the hsp70 promoter after heat shock. Thus, the conditions allow the access of antibodies to proteins that are in close proximity to hsp70 promoter DNA. Thus, given that these experiments involve the use of many highly specific polyclonal antibodies and that the effect is observed with multiple polypeptides and is not a consequence of the heat-shock treatment, the first model appears to be unlikely (Lebedeva, 2005).

In the second scenario, TBP, several TAFs, TFIIB, and TFIIF do not participate in the ongoing transcription of heat-shock genes after heat induction. For instance, the factors required for transcription reinitiation may be a subset of those that participate in the first round of transcription. In fact, biochemical studies in yeast have shown that some, but not all, GTFs remain at the promoter after initiation and form a platform for the assembly of subsequent reinitiation complexes. This subset of factors includes TBP, TAF5, TFIIA, TFIIH, TFIIE, and Mediator, but not TFIIB or TFIIF. In accord with those results, this stydy found that TFIIH (XPB subunit) and Mediator (MED13), but not TFIIB or TFIIF remain at the hsp70 promoter after heat induction. In contrast, the apparent occupancy of TFIID (TBP, TAF1, and several other TAFs) is significantly reduced upon heat shock. Thus, for the second scenario to be correct, TBP and several TAFs must be dispensable for transcription reinitiation from heat-induced hsp70 promoters (Lebedeva, 2005).

In the third scenario, the average occupancy of the basal transcription factors at the hsp70 promoters is higher in the inactive gene than in the transcriptionally induced gene. This situation could occur if the basal transcription factors are in a static complex at the inactive hsp70 promoter and in a rapid cycling state of preinitiation-complex assembly and disassembly at the transcriptionally active hsp70 promoter. More specifically, in vivo data in the context of the third scenario suggest that TBP, several TAFs, TFIIB, and TFIIF make a transition from a static state to a rapidly cycling state upon heat-shock induction (Lebedeva, 2005).

It should be considered that the latter two scenarios might appear to be inconsistent with in vivo KMnO₄ footprinting data, which suggest that TFIID binds to the Drosophila hsp70 promoters both before and after heat shock. In this regard, it should be noted that ChIP (as well as immunofluorescence) and footprinting experiments yield distinct types of information. ChIP provides data regarding the occupancy of a particular factor at a specific DNA sequence but does not indicate how the factor interacts with DNA or if the factor is biochemically active. Moreover, in some instances, specific DNA-bound factors may not be detectable by ChIP (although, as discussed above, it is unlikely that multiple subunits of a protein complex, such as TFIID, would be invisible in a ChIP assay with multiple polyclonal antibodies). In vivo footprinting, however, shows that a factor is bound to a specific DNA sequence but does not indicate exactly what factor is bound to that sequence. Therefore, the models and data are not necessarily contradictory. For example, it is possible that the factor that is responsible for the TATA footprint in the induced gene is not TBP or TFIID but rather another protein, such as a TBP-related factor, or a TFTC/STAGA-type complex. Alternatively, an induced hsp70 promoter might not contain the complete TFIID complex but rather only a subcomplex or TBP alone that is in a ChIP-invisible state, possibly hidden under other proteins, such as the polymerase. At the present time, however, the resolution of these issues will require the development of more sophisticated assays for the analysis of the functions of transcription factors in vivo (Lebedeva, 2005).

Thus, a model for the activation of hsp70 genes is as follows. First, the inactive gene contains many GTFs (such as TFIIB, TFIID, TFIIF, and TFIIH) as well as the downstream paused RNA Pol II. Upon heat induction, HSF binds to the promoter and recruits coactivators, such as Mediator and SAGA complexes, and these factors promote the release of the paused polymerase and the assembly of a new transcription preinitiation complex. After initiation, the transcription complex might partially disassemble, at which point factors such as TFIIB and TFIID (or many TFIID subunits) dissociate from the template DNA. (TFIIF may remain associated with the elongating polymerase and thus depart the promoter region.) Then, in subsequent rounds of initiation (i.e., reinitiation), the reassociation of TFIIB and TFIID with the template may be fleeting with a low residence time at the promoter (the third scenario described above). Alternatively, TFIIB and TFIID may be dispensable for reinitiation (the second scenario described above). TFIIH, in contrast, is needed to unwind the template DNA for every new round of transcription; thus, the average occupancy of TFIIH at the promoter increases along with the polymerase in proportion to the number of transcription reinitiation events. Thus, upon heat induction, an increase would be observed in HSF, Mediator, SAGA/TFTC, TFIIH, and RNA Pol II as well as a decrease in TFIIB, TFIID (or many TFIID subunits), and TFIIF at the promoter (Lebedeva, 2005).

The specific mechanism of transcriptional activation by HSF at heat shock genes is likely to be one of multiple mechanisms of regulation that are used in vivo. For example, in contrast to what is seen at the hsp70 promoters, the apparent occupancy of TBP, TFIIB, and several TAFs at ecdysone-responsive promoters does not decrease upon transcriptional induction, even if the cells are also subjected to heat shock (Lebedeva, 2005).

In conclusion, these results with the hsp70 promoters provide an example of a transcriptional mechanism wherein the apparent occupancy of TBP, several TAFs, TFIIB, and TFIIF decreases upon gene activation. Therefore, the extent of the apparent occupancy of these factors at a given promoter does not necessarily reflect the transcriptional activity of that promoter. The discovery and analysis of distinct transcriptional mechanisms is a key step toward the ultimate goal of understanding all of many strategies that are used by the cell to control gene activity (Lebedeva, 2005).

Stepwise modifications of transcriptional hubs link pioneer factor activity to a burst of transcription

Binding of transcription factors (TFs) promotes the subsequent recruitment of coactivators and preinitiation complexes to initiate eukaryotic transcription, but this time course is usually not visualized. It is commonly assumed that recruited factors eventually co-reside in a higher-order structure, allowing distantly bound TFs to activate transcription at core promoters. This study used live imaging of endogenously tagged proteins, including the pioneer TF Zelda, the coactivator dBrd4 (Female sterile (1) homeotic), and RNA polymerase II (RNAPII), to define a cascade of events upstream of transcriptional initiation in early Drosophila embryos. These factors are sequentially and transiently recruited to discrete clusters during activation of non-histone genes. Zelda and the acetyltransferase dCBP (Nejire) nucleate dBrd4 clusters, which then trigger pre-transcriptional clustering of RNAPII. Subsequent transcriptional elongation disperses clusters of dBrd4 and RNAPII. These results suggest that activation of transcription by eukaryotic TFs involves a succession of distinct biomolecular condensates that culminates in a self-limiting burst of transcription (Cho, 2023).

In eukaryotes, the recruitment of RNA polymerase II (RNAPII) to transcription start sites on DNA depends on the assembly of the preinitiation complex (PIC) and is regulated by hundreds of trans-acting factors. In particular, transcription factors (TFs) recruit nucleosome remodelers, histone modifiers, and Mediator to promote the formation of PIC. How these numerous upstream inputs are integrated to give the extraordinary specificity and intricacy of transcriptional regulation remains incompletely understood. A common view suggested by biochemical studies is that these factors are progressively assembled into a single final complex through cooperative interactions. However, other sophisticated processes initiating DNA replication and promoting splicing of mRNAs are governed by a series of distinct and ephemeral complexes in which each complex promotes the next in energy-driven steps. This study examines the possibility that initiation of transcription similarly involves directional transformations of intermediate complexes that would provide additional opportunity for specificity and regulation (Cho, 2023).

Visualizing the composition of transcriptional machinery over time might detect intermediate complexes that integrate the multitude of regulatory inputs of transcriptional control. In recent years, advances in confocal and super-resolution imaging led to the discovery that a wide variety of transcriptional regulators are recruited to form clusters at active genes. These clusters are thought to function as 'transcriptional hubs' by locally enriching transcriptional machinery and enhancing their binding to target DNA sites. Transcriptional hubs are a type of membraneless compartment, whose formation typically involves the multivalent interaction between intrinsically disordered regions (IDRs). Accordingly, IDRs are commonly found in the activation domains of TFs as well as the C-terminal domain (CTD) of Rpb in RNAPII. Similar to the idea that a single final complex is assembled on the DNA to initiate transcription, it has been proposed that the heterotypic interactions between IDRs can give rise to a compartment that simultaneously enriches TFs, coactivators, Mediator, and RNAPII at promoters. Nonetheless, how transcriptional hubs are regulated and whether they undergo compositional changes are still unclear (Cho, 2023).

Studying the dynamics of transcriptional hubs in living cells is complicated by the discontinuous and stochastic nature of eukaryotic transcription, a phenomenon also known as bursting. The Drosophila embryo provides a powerful context to study the timing of events upstream of transcriptional initiation. The early wave of transcription in Drosophila embryos is coupled to the rapid nuclear division cycles such that a few hundred genes initiate a burst of transcription about 3 min after each mitosis. The synchrony of early nuclear cycles and real-time localization of tagged proteins allow one to track activation events prior to the onset of transcription, and tools to knockdown function are available to assess the contribution of events to gene activation. In a recent study, live imaging of endogenously tagged RNAPII revealed the abrupt appearance of RNAPII clusters 2–3 min after mitosis. Brief metabolic labelling revealed foci of nascent transcripts throughout the nuclei in fixed embryos—these foci broadly colocalized with RNAPII clusters, indicating that early-forming RNAPII clusters mark sites of active transcription. Importantly, as nascent transcript levels increased, RNAPII clusters declined and eventually dispersed. These observations are consistent with numerous observations and support a model in which a large excess of RNAPII is recruited prior to initiation, which is then inefficiently converted to elongating RNAPII. What produces this pre-transcriptional RNAPII clustering and how it is coordinated with a burst of transcription are not yet fully understood. In this study, events are followed during the ~2.5 min between mitotic exit and the formation of RNAPII clusters and the fate of these clusters as transcription ensues at about 3 min after mitosis (Cho, 2023).

Zelda (Zld) is a pioneer TF that widely promotes the early wave of zygotic gene expression. Maternally supplied Zld binds to thousands of enhancers and promoters, and its binding sites exhibit increased chromatin accessibility and histone acetylation. Depletion of maternally expressed Zld curtails early zygotic transcription, and the embryos become highly defective at the mid-blastula transition (MBT). The transactivation domain of Zld has been mapped to an intrinsically disordered region. Moreover, fluorescently tagged Zld forms highly dynamic clusters in the nucleus, and previous studies suggest that Zld clusters increase the local concentration of other TFs and facilitate their binding to target DNA. Knockdown of Zld reduces RNAPII 'speckles' in fixed embryos. While these previous studies support a model in which Zld promotes the recruitment of additional components to form transcriptional hubs and facilitates the onset of zygotic transcription, the exact mechanism has not been determined (Cho, 2023).

This study combined genetic perturbation and real-time imaging to delineate a pathway that nucleates and serially transforms transcriptional hubs to trigger initiation of transcription in early Drosophila embryos. Zld is shown to act through transcription coactivators, including the lysine acetyltransferase dCBP and the BET protein dBrd4, to initiate RNAPII clustering at non-histone genes. Importantly, real-time imaging reveals only limited colocalization of these factors at transcriptional hubs, suggesting dynamic and directional changes in the composition such that upstream activators do not stably persist in the hubs with downstream effectors and RNAPII. A model is proposed in which Zld forms numerous largely unstable clusters, some of which trigger a dCBP-dependent step to build more stable dBrd4 clusters; a subset of these dBrd4 clusters then promotes RNAPII clustering near active promoters, and this pool of RNAPII fuels a burst of transcription. Inhibition of transcriptional elongation stabilizes some Zld and dBrd4 clusters, indicating that transcription directly or indirectly promotes their dispersal. Finally, while early inhibition of transcription inhibits RNAPII clustering, abrupt inhibition of transcript elongation after hub formation stabilizes RNAPII clusters. These findings indicate that transcription destabilizes hubs, a feedback that could lead to cycles of RNAPII accumulation and depletion, thereby contributing to the busting feature of transcription. It is suggested that the onset of transcription, like the onset of replication, involves upstream events that directionally modify the machinery to precisely control the process (Cho, 2023).

It has long been recognized that the compartmentalization of transcriptional machinery is a fundamental aspect of eukaryotic gene control. Early cytological studies revealed discrete clusters of RNAPII and nascent transcripts, which were speculated to be stable "transcription factories". Subsequent studies show that rather than genes being recruited to stable factories, numerous factors form hubs or liquid-like condensates transiently at active genes. This leaves open the questions of what governs the dynamics of transcriptional hubs/condensates and how their emergence and dispersal are linked to transcript synthesis. This study used real-time approaches to dissect upstream events in transcriptional initiation whose timing is constrained and synchronized in early Drosophila embryos by coupling to the rapid cell cycles. A cascade of dependencies is documented paralleled by a temporal cascade of cluster formation. The findings indicate that transcriptional hubs directionally pass through a series of intermediate states with different composition, rather than simply enriching all the factors involved in initiating transcription. Specifically, the pioneer TF Zelda acts through coactivators dCBP and dBrd4 to indirectly concentrate pools of RNAPII near promoters. Inhibition of transcription by α-amanitin stabilizes dBdr4 and RNAPII clusters, indicating that transcription directly or indirectly promotes dispersal of transcriptional hub components resulting in negative feedback. It is suggested that the progressive maturation of transcriptional hubs coupled with a negative feedback-loop stimulates a rapid but self-limiting burst of transcription in the early rapid embryonic cycles. These findings have striking parallels to the proposal that non-equilibrium dynamics of transcriptional condensates make direct contributions to sequential transcriptional bursts in the longer cell cycles of more mature cells (Cho, 2023).

The dynamic nature of transcriptional hubs described in this study is distinct from the well characterized transcriptional condensates at nucleoli or histone locus bodies, which are stable compartments and incorporate multiple functionally related components. The dynamic process with its multiple transitions might serve to add precision and sophistication to transcriptional control. First, transitions between discrete steps could provide proofreading steps that test the stability of intermediate complexes to filter out stochastic noise and increase regulatory specificity. Second, additional regulators might promote or prevent passage through the different transitions, thereby allowing the transcriptional hubs to integrate multiple inputs to generate the intricate spatiotemporal expression of developmental genes. In line with these ideas, these data show that the transitions from Zld clusters to dBrd4 and then to RNAPII are each associated with a decline in the number of clusters, suggesting that the maturation of transcription hubs is selective at successive steps. It will be important to learn how this feature contributes to the extraordinary accuracy with which the graded and combinatorial inputs generate transcriptional outputs (Cho, 2023).

The molecular mechanisms that drive the sequential transformation of transcriptional hubs remain to be fully determined. During the first step, Zld and dCBP might directly interact with each other or undergo co-condensation. Alternatively, open chromatin established by Zld could facilitate binding of additional TFs that interact with dCBP. However, it should also be kept in mind that TFs might inhibit deacetylation to indirectly enhance local dCBP-dependent acetylation. In any case, it seems likely, but not yet demonstrated, that dCBP acts by increasing local acetylation to recruit the reader dBrd4. Although dBrd4 might simply bind to histone marks such as H3K27ac, the acetylation of transcriptional machinery could also be involved in recruiting dBrd4. Upon crossing a concentration threshold, dBrd4 clustering might be promoted by multivalent interactions mediated by its own IDR. While the initial clustering of RNAPII appears to spatially coincide with dBrd4 clusters, the subsequent behavior is not consistent with stable partnership, as dBrd4 is lost from temporarily persisting RNAPII clusters. Imaging the period of loss of dBrd4 revealed accompanying features that varied between clusters: abrupt physical rearrangement of foci, simple gradual loss of dBbr4 from complexes, and apparent de-mixing of previously colocalized signals to form largely separate dBrd4 and RNAPII clusters. These behaviors may represent different manifestations of progressive modifications of the biomolecular condensate that reduce the interactions that previously stabilized co-residency of dBdr4 and RNAPII. Finally, both positive and negative effects of transcriptional elongation on the dynamics of transcriptional hubs were observed. The initial requirement of transcription for RNAPII clustering might involve the upstream roles of enhancer RNAs in nucleating RNAPII cluster. In contrast, a later sustained period of transcription of the gene body appears to mediate negative feedback to disperse dBrd4 and RNAPII clusters. This could be explained by a suggested disruption of multivalent interaction between IDRs by the negative charge of nascent RNA but numerous other less direct mechanisms might be responsible. While thw results reveal the timing and coordination of upstream events required for transcription, much more work is needed to provide a mechanistic understanding of the observed processes (Cho, 2023).

Regardless of the molecular details, it is expected that similar regulatory principles are employed by evolutionarily diverse transcription factors to mediate transcriptional activation. For example, in the zebrafish embryo, the pioneer factors Nanog, Pou5f3, and Sox19b similarly recruit CBP/p300 and Brd4 to establish transcriptional competence during early zygotic gene expression. Activation by estrogen receptor α (ERα) also involves histone acetylation and subsequent recruitment of Brd46. Notably, elegant work has shown that dozens of factors are recruited to the ERα target promoter in a cyclical and sequential fashion. It is envision that many of these factors are dynamically recruited to the hubs, and that the enzymatic reactions they carry out contribute to the speed and irreversibility of the transformation of transcriptional hubs. Lastly, it is suggested that the formation of transcriptional hubs in early embryos ensures the rapid initiation of a transcriptional burst within a short interphase window; in other biological contexts, the hubs might serve additional functions such as bridging enhancers and promoters or coordinating expression of multiple loci65. The Drosophila embryos will provide a powerful system to dissect the relationship between transcriptional hubs, chromatin interactions, and transcription dynamics (Cho, 2023).

A bistable autoregulatory module in the developing embryo commits cells to binary expression fates

Bistable autoactivation has been proposed as a mechanism for cells to adopt binary fates during embryonic development. However, it is unclear whether the autoactivating modules found within developmental gene regulatory networks are bistable, unless their parameters are quantitatively determined. This study combined in vivo live imaging with mathematical modeling to dissect the binary cell fate dynamics of the fruit fly pair-rule gene fushi tarazu (ftz), which is regulated by two known enhancers: the early (non-autoregulating) element and the autoregulatory element. Live imaging of transcription and protein concentration in the blastoderm revealed that binary Ftz fates are achieved as Ftz expression rapidly transitions from being dictated by the early element to the autoregulatory element. Moreover, this study discovered that Ftz concentration alone is insufficient to activate the autoregulatory element, and that this element only becomes responsive to Ftz at a prescribed developmental time. Based on these observations, a dynamical systems model was developed, and its kinetic parameters were quantitated directly from experimental measurements. This model demonstrated that the ftz autoregulatory module is indeed bistable and that the early element transiently establishes the content of the binary cell fate decision to which the autoregulatory module then commits. Further in silico analysis revealed that the autoregulatory element locks the Ftz fate quickly, within 35 min of exposure to the transient signal of the early element. Overall, this work confirms the widely held hypothesis that autoregulation can establish developmental fates through bistability and, most importantly, provides a framework for the quantitative dissection of cellular decision-making (Zhao, 2023).

One of the central questions in developmental biology concerns how cells precisely and irreversibly adopt distinct cellular fates. It has been argued that cells assume their unique gene expression profiles through a sequence of decisions among branching paths, famously encapsulated by 'Waddington's landscape' of peaks and valleys delineating the possible trajectories that a cell can follow. Genetic networks that lock a cell into one of these trajectories may be thought of as 'memory modules' that guide cells through valleys in the landscape to their ultimate fates. In the simplest case, where a decision is made between two alternative developmental fates, the memory module is binary and referred to as a switch. The state of the switch is set by the action of transient upstream regulatory signals. Several genetic motifs, including autoactivation and mutual repression, are capable of maintaining binary cell fates. However, the mere presence of these motifs is insufficient to guarantee that a network can remember its expression state once upstream regulators have degraded. The ability to lock onto high or low expression levels results from bistability, a systems-level property that depends upon the quantitative details of the kinetics of the involved chemical reactions (Zhao, 2023).

Though bistability is widely invoked to explain cell fate decisions, relatively little quantitative data exist to confirm bistability in gene expression modules within developing embryos. Previous studies in cell culture and fixed embryos have provided evidence for bistability in hematopoietic differentiation, the Shh network, the vertebrate hindbrain, between the BMP and FGF morphogens, and within the Notch-Delta signaling system. Quantitative evidence for multistability in fruit fly embryos has also been derived from fitting the parameters of high-dimensional network models to fixed tissue measurements While these models are capable of reproducing the observed phenomenology, there is no guarantee that the optimal set of inferred parameter values reflects actual biophysical quantities (Zhao, 2023).

Thus, it is important to verify that the conclusions drawn from computational modeling and in vitro experiments apply to developmental systems in vivo in the context of models that quantitatively capture the molecular interactions that underlie cellular decision-making. Evidence for the bistability of a genetic module based on these molecular interactions in an intact multicellular organism has not yet been demonstrated (Zhao, 2023).

The early development of the fruit fly Drosophila melanogaster is an ideal model system for studying binary cell fate determination, due to the presence of pair-rule genes such as fushi tarazu (ftz) that form discrete stripes at the cellular blastoderm stage prior to gastrulation (2.5–3.5 h after fertilization. The expression of ftz is regulated by two main enhancers: the early, or zebra, element and the autoregulatory element. The early element responds to upstream transcription factors such as the gap genes to establish the initial expression pattern of seven stripes. This element is functionally distinct from the autoregulatory element, which contains multiple Ftz binding sites that allow Ftz to activate its own expression. This autoactivation network motif is theoretically capable of exhibiting bistability and has therefore been hypothesized to act as a binary memory module (Zhao, 2023).

Whether a cell possesses a memory module determines whether observed states of gene expression are transient in the absence of continued external signaling, or whether these states can be locked into permanent cell fates that can be maintained without further intervention. Specifically, if the autoregulatory module is bistable, then it maintains high ftz expression driven by the transient presence of upstream factors, even once those factors degrade (or until further regulatory mechanisms intervene). If, instead, the autoregulatory element is monostable, then the observed separation of Ftz concentration into high and low levels persists only as long as upstream factors are present to regulate expression. In their absence, Ftz expression would revert to a single fate for all cells. It is important to note, however, that in this case, the transiently high or low trajectory of Ftz concentration could still be instructive for regulating downstream genes (Zhao, 2023).

This study characterize the ftz autoregulatory module in vivo through quantitative real-time measurements in living fruit fly embryos. Focusing on stripe 4 expression, Ftz expression was observed to separate into discrete high and low levels at the blastoderm stage during the 20 min prior to gastrulation, concurrent with a transition in regulatory control from the early to the autoregulatory element. It was discovered that autoregulation is triggered at a specific time point in development—presumably through the action of 'timer genes'—rather than through a readout of Ftz concentration alone. Based on these observations, a dynamical systems model was developed and its parameters were quantitated from simultaneous real-time measurements of ftz transcription and Ftz protein dynamics in single cells of living embryos. This model predicts binary Ftz expression levels at gastrulation with high accuracy and demonstrates that, indeed, the ftz autoregulatory module is bistable. It is concluded that the ftz autoregulatory element acts as a memory module to commit cells to binary fates that are otherwise transiently defined by the early element, thereby validating a long-standing hypothesis in developmental and systems biology that bistability underlies cell fate determination. Simulations further make it possible to quantitatively define a developmental commitment window, which shows that the autoregulatory module requires about half an hour to establish a memory of the transient signal from the early module. Thus, yhid work provides a framework for the dissection of other regulatory modules in the gene regulatory networks that dictate development based on this interplay between dynamical systems models and real-time experiments (Zhao, 2023).

For decades, developmental biologists have used Waddington's landscape to conceptualize cellular decision-making. Under this framework, cells roll down valleys in a predetermined landscape to adopt their ultimate fates. This framework has been repeatedly mathematicized using dynamical systems theory. Many of these studies have hypothesized that autoactivation helps establish and maintain binary cell fates through bistability, which can be thought of as introducing forks in Waddington's landscape. Though experiments in cell culture and fixed tissue have provided evidence for the bistability of various autoregulatory modules found within gene regulatory networks, until now, these results have not been confirmed by direct examination of dynamics in intact, living embryos (Zhao, 2023).

This work utilized live imaging to quantitatively characterize the dynamics of the fruit fly ftz regulatory system in vivo. Tight temporal coordination was elucidated between the two enhancer elements that regulate ftz expression and combined dynamical systems modeling with biophysical measurements to show that the bistability of the autoregulatory module can maintain otherwise transient expression levels driven by upstream factors. Based on the prevalence of autoregulatory motifs in nature, it is speculated that the approach employed by the Ftz system to decide cell fate is not limited to fruit flies but might also be widely adopted during development in other organisms (Zhao, 2023).

One of the central discoveries of this study is that ftz autoregulation is triggered at a specific developmental time rather than being triggered when Ftz reaches a certain threshold concentration. Recent work has suggested candidates for 'timer genes' that are expressed at distinct developmental time points and appear to facilitate the expression of other genes. It is speculated that timer genes might also bind the ftz autoregulatory element to trigger its responsiveness to Ftz (Zhao, 2023).

This study relied on quantitative modeling with no free parameters to provide strong evidence demonstrating that the Ftz autoregulatory module is bistable. Dynamical systems models such as the one employed in this study are advantageous for this approach since the parameters are biophysically interpretable, which is not true in, for example, more coarse-grained Boolean models of genetic networks. Although the model in the main text is deterministic, in preliminary work a stochastic chemical reaction network simulated by the Gillespie algorithm, was considered as well as a stochastic differential equation model. Aside from introducing noise to Ftz expression levels, and in contrast to other patterning networks where noise appears crucial to drive fate determination, these models did not exhibit any unique behaviors that better accounted for the experimental observations compared to the deterministic model (Zhao, 2023).

The results presented in this work were derived from experiments paired with mathematical analysis of the theoretical model. The conclusions could be further supported—or challenged—by experiments where Ftz initial conditions are altered through, for example, heat shock or optogenetic approaches. Future research on Ftz autoregulation would also benefit from technological advancements that extend quantitative live imaging capabilities beyond gastrulation in order to monitor Ftz concentrations over extended timescales, from the onset of autoregulation to the final adoption of cellular fate (Zhao, 2023).

A basic assumption of the approach employed in this study is that the behavior of the whole network can be predicted from the behavior of the parts (modules) in isolation. In this model, the early and late populations of Ftz were divided into two separate modules. The early module produces the early protein, which acts as an input to the autoregulatory module responsible for producing the late protein. Positive feedback arises because ftz also activates its own expression. This is not the only way to define the module; for example, if it were known which regulatory factors rendered the autoregulatory element responsive, then those as inputs could be included instead of describing their activity implicitly through the parameter. Similarly, if the dynamics of the upstream regulatory factors for the early element were known, β, the phenomenological temporal decay in the early transcription rate, could be substituted with a mathematical expression relating the concentrations of these regulatory factor inputs to the early Ftz transcription rate (Zhao, 2023).

The representation of the autoregulatory module can predict the fate of ftz expression from arbitrary trajectories of early Ftz. As a result, it was possible to predict the effect of modifications to upstream signaling on the resulting gene expression patterns. This leads to asking what forms of input are appropriate to achieve particular patterning outcomes. Such ability to reverse engineer the process of cellular decision-making could facilitate designing perturbations to manipulate the system, identifying constraints placed on upstream modules by the needs of downstream modules, and analyzing whether biologically evolved signals match those that are mathematically 'optimal' for such needs as patterning speed or information transmission.70 Different methods of generating predictions may be appropriate depending on the types of inputs under consideration. In this paper, the fact that increasing any one of the parameters that define the early Ftz input, increases total Ftz concentration at all points in time (a property known as monotonicity) made it possible to analyze the model using a switching separatrix. However, this may not be true for other regulatory systems, as in the case where a gene within a module represses its own production (Zhao, 2023).

Throughout developmental biology, the concept of a commitment window has been repeatedly utilized to describe the amount of time cells need to be exposed to upstream signals in order to decide their developmental fates. This quantitative dynamical systems model enabled conducting of a detailed examination of this commitment window and to identify what fraction of cells adopt certain fates as developmental timing is varied. From an engineering perspective, a gene expression pattern may be considered as an objective that must be achieved with a prescribed level of precision (i.e., as a design specification) and work backward to see what inputs satisfy this requirement. This approach complements existing work on precision that emphasizes how tightly protein concentrations are controlled and how accurately cells can locate their position by reading out concentrations of upstream factors. In particular, the latter approaches indicate what level of precision is actually achieved by a patterning network, while the framing of this study focuses rather on what range of parameters allows a system to attain a predefined level of precision. A combination of the two perspectives could help elucidate what biophysical and evolutionary factors influence stochastic variation in phenotypes, including how precise expression patterns must actually be to produce functional, healthy organisms (Zhao, 2023).

In summary, by turning widespread schematic models of autoactivation modules into precise mathematical statements and experimentally testing the resulting predictions, this study has provided support for a widely held hypothesis about how developmental fates are established in embryos. In the future, combining quantitative measurements with precise spatiotemporal perturbations and synthetic reconstitution methods promises to enable yet another iteration of the dialogue between theory and experiment that constitutes the basis of this work, ultimately leading to a predictive understanding of function in developmental networks and the myriad forms and fates to which they give rise (Zhao, 2023).

Widespread regulatory specificities between transcriptional co-repressors and enhancers in Drosophila

Gene expression is controlled by the precise activation and repression of transcription. Repression is mediated by specialized transcription factors (TFs) that recruit co-repressors (CoRs) to silence transcription, even in the presence of activating cues. However, whether CoRs can dominantly silence all enhancers or display distinct specificities is unclear. This work reports that most enhancers in Drosophila can be repressed by only a subset of CoRs, and enhancers classified by CoR sensitivity show distinct chromatin features, function, TF motifs, and binding. Distinct TF motifs render enhancers more resistant or sensitive to specific CoRs, as was demonstrated by motif mutagenesis and addition. These CoR-enhancer compatibilities constitute an additional layer of regulatory specificity that allows differential regulation at close genomic distances and is indicative of distinct mechanisms of transcriptional repression (Jacobs, 2023).

Animal development and homeostasis critically depend on differential gene expression, enabled by the precise regulation of transcriptional activation and repression. Although repression is often associated with heterochromatin, genes can also be silenced in transcriptionally permissive euchromatin by repressive transcription factors (TFs), also termed repressors, that bind to DNA and recruit corepressors (CoRs). As CoRs can suppress transcription even in the presence of activators, this mode of gene silencing is termed active transcriptional repression. Active repression is critical and its failure can cause developmental defects and diseases like cancer; and it is conceptually intriguing as it requires the fast and efficient overriding of activating cues. However, the modes and mechanisms of this process are unclear, and whether a regulatory code coordinates repression and activation, is unknown. Given that transcriptional activation can occur via distinct and mutually incompatible modes, it is intriguing to speculate whether distinct modes of active transcriptional repression exist. Examples of specificities between repressors and activators have indeed been observed. The CoR Retinoblastoma protein (Rb) as part of the DREAM complex for example can repress the TFs E2F, Mip120 and PU.1 but not others like SP-1, and repressors can have different activities in distinct transcriptional contexts. Comprehensive studies allowing to define different modes of active repression and uncover their regulatory rules are however lacking. This study determined the mutual compatibilities between five known CoRs and a genome-wide library of active enhancers by measuring enhancer-activity changes upon CoR tethering in otherwise unperturbed cells, similar to activator-bypass experiments. It was reasoned that testing each enhancer with each CoR in all possible combinations should reveal CoR-enhancer combinations that lead to decreased enhancer activity (enhancers are sensitive) and those that do not (resistant), indicative of compatible and incompatible pairings (Jacobs, 2023).

First, this study tested whether distinct specificities between CoRs and enhancers exist, whereby certain enhancers are sensitive to repression by a given CoR, while other enhancers are resistant. For this, it was necessary to systematically measure the effects that selected CoRs have on the activity of a large number of enhancers. The comprehensive mapping of CoR-enhancer compatibilities, by examining all combinations of CoRs and enhancers, requires highly controllable quantitative high-throughput assays. Therefore the massively parallel enhancer-activity assay STARR-seq was modified to enable the function-based testing of genome-wide enhancer candidate libraries with different CoRs. Briefly, four upstream-activating-sequence (UAS) motifs were introduced immediately downstream of the enhancer library, which leave the enhancer sequence intact yet allows for the direct tethering of selected CoRs via the Gal4 DNA-binding domain. The tethering of CoRs next to active enhancers directly assesses whether CoRs can override existing activating cues, a process akin to active repression (Jacobs, 2023).

Drosophila S2 cells were chosen as a model system, and a panel of five CoRs; CoRest, CtBP, Rbf, Rbf2 and Sin3A was tested. These CoRs represent different protein complexes, repressive pathways, enzymatic functions, and distinct groups with context-specific functions. Testing diverse CoRs casts a wide net and should increase the ability to detect compatible and incompatible CoR-enhancer pairs. For each CoR, two independent UAS-STARR- seq screens were performed where the UAS-STARR-seq library was co-transfect with a vector that expresses the Gal4-CoR (or Gal4-GFP as neutral control), and spike-in-controls were used for normalization. As spike-in controls a distinct STARR-seq library was used containing 18 Drosophila pseudoobscura enhancers, cloned without the UAS motifs, and hence not targeted by the Gal4- -CoRs. In all cases, the two independent replicates correlated well. In order to reliably assess repression (which requires a high baseline activity), it was decided to evaluate enhancer-activity changes for 3094 enhancers that were highly active in Gal4-GFP controls (Jacobs, 2023).

First, which of the 3094 enhancers could be silenced by the highly conserved CoR CtBP was determined, by assessing the enhancer activity changes when tethering Gal4-CtBP versus Gal4-GFP. This revealed that some enhancers, like the enhancers near Orct, Orct2 and Pka-C3, were reproducibly repressed by Gal4-CtBP, whereas others, like the enhancer near CG10516, were unaffected. A differential analysis based on the two replicates per condition using edgeR showed that CtBP significantly repressed 759 enhancers but not the remaining 2335. Thus, CtBP could only repress ~25% of the enhancers, suggesting that CtBP displays preferences or specificities towards some enhancers but not others (Jacobs, 2023).

The enhancer-activity changes upon recruiting Rbf2, a CoR from the retinoblastoma protein family was determined. Interestingly, Rbf2 was also able to repress only a subset of the enhancers (1733, 56%), including the enhancers near Orct and Pka-C3, which were also repressed by CtBP, and CG10516, which was not repressed by CtBP. In contrast, Rbf2 was unable to repress the aforementioned Orct2 enhancer and others, indicating that the specificities of Rbf2 differ from those of CtBP. Indeed, while 502 enhancers were repressed by CtBP and Rbf2, 1231 enhancers were repressed by Rbf2 but not by CtBP and, vice versa, 257 enhancers were repressed by CtBP but not by Rbf2. These enhancer-CoR specificities were validated in luciferase reporter assays, in which the CoRs were recruited upstream of the enhancer and promoter. This assay confirmed that an enhancer near serpent was specifically repressed by Gal4-CtBP but not Gal4-Rbf2, an enhancer near CG2116 was specifically repressed by Gal4-Rbf2 but not Gal4-CtBP, and an intronic enhancer of kay was repressed by both CoRs, each in agreement with the STARR-seq results. Taken together, the CoRs CtBP and Rbf2 are each able to repress a specific subset of enhancers but not others, indicating the existence of distinct CoR-enhancer specificities (Jacobs, 2023).

Screening three additional prominent CoRs; CoRest, Rbf and Sin3A revealed that each of them was able to repress a specific subset of enhancers. CoRest was the strongest repressor, repressing 1452 enhancers and often reducing their activity to background levels. The overall repression profiles of CoRest and Sin3A were similar to that of CtBP and clearly different from Rbf2. Rbf's repression profile was not similar to any of the other tested CoRs. In general, the five tested CoRs were each able to repress a subset of enhancers and displayed distinct specificities (Jacobs, 2023).

Given that enhancers were differentially sensitive to some of the tested CoRs, and resistant to others, whether these differential sensitivities were related to other enhancer properties was examined. For this, the enhancers were clustered into groups of similar repression patterns. A self-organizing tree algorithm (SOTA) was used with the PCC as distance metric to cluster the enhancers into five groups based on their sensitivity to the CoRs. Cluster 1 contains CoRest and CtBP resistant enhancers while enhancers in cluster 2 are resistant to CtBP and Sin3A. Cluster 3 contains enhancers that are resistant to Rbf2, enhancers in cluster 4 are sensitive to all CoRs while enhancers in cluster 5 are overall very sensitive to repression but resistant to Rbf. Hence, the enhancers of S2 cells could be divided into five groups, defined by their differential response to the tested CoRs (Jacobs, 2023).

Enhancers clustered by corepressor sensitivity differ in chromatin marks, transcription factor motif content and binding Next, whether these enhancer clusters differ in additional properties that correlate with their behaviour towards the CoRs was tested. Initial enhancer activity, as measured by UAS- STARR-seq with Gal4-GFP, was similar for all five clusters. Also H3K27ac, a histone modification that marks active enhancers, was similarly enriched at the endogenous enhancer loci of all clusters. It was inferred that the distinct specificities did not stem from differences in initial enhancer strengths. The histone modifications H3K4me1 and H3K4me3 did however show differential and complementary trends: H3K4me1 was the highest enriched at cluster 5, followed by 4 and 3, whereas H3K4me3 was more highly enriched at enhancer clusters 1 and 2. High H3K4me1 levels combined with high H3K27ac levels have been associated with distal regulatory regions or cell type-specific enhancers, suggesting that enhancers with the highest H3K4me1 levels (cluster 5) might be most specific to S2 cells. Analysing chromatin accessibility and H3K27ac levels in other cell types and tissues indeed revealed that enhancers from cluster 5 were highly cell type-specific. Interestingly, these enhancers were also the most strongly repressed by CoRest, CtBP and Rbf2, suggesting that developmental or cell type-specific enhancers might intrinsically be more sensitive to repression by certain CoRs, while more globally active or housekeeping enhancers might be more resistant. As active enhancers are known to function through a variety of different TFs, it was hypothesised that their differential response to the CoRs might be linked to the specific TFs they bind (Jacobs, 2023).

Using published TF ChIP-seq data, it was observed that prominent TFs were bound to enhancers from specific clusters and absent from others. The TFs DREF and M1BP for example bound almost exclusively to enhancers from clusters 1 and 2, with DREF preferring cluster 1 and M1BP cluster 2. Trithorax-like (Trl, also called GAGA factor or GAF) on the other hand was absent from these clusters and instead mainly bound to enhancers from cluster 3 and to a lower extent to cluster 4. The distribution of these three TFs suggests that differential CoR sensitivities might relate to distinct TFs. To identify associations between CoR sensitivities and TFs in a more comprehensive manner, TF-motif enrichment analyses was performed for the enhancers of the five clusters using the 6502 TF motifs from the iRegulon database. Consistent with the ChIP-seq results, motifs for DREF, M1BP and Trl were specifically enriched in clusters 1, 2 and 3, respectively. Additional motifs were also differentially enriched between the clusters: Rsc30 and E-box motifs were enriched in cluster 3, Sim motifs were enriched in cluster 4, while ETS and GATA motifs were enriched in cluster 5. Other prominent TF motifs were enriched over several clusters, including AP-1, Ato and CrebB. Taken together, the different enhancer clusters, defined by their sensitivity towards CoRs, associate with distinct chromatin features and TF motif content (Jacobs, 2023).

To directly test the association between TF motifs and sensitivity towards each of the CoRs,the enrichment of TF motifs within enhancers that were sensitive ) versus those that were resistant to each CoR were evaluated. For CtBP, for example, it was found that AP-1, Trl and GATA motifs were strongly enriched in sensitive enhancers, whereas DREF, Ohler1 and Mip120 motifs were enriched in the resistant enhancers. Motif enrichment profiles across all five CoRs revealed an intricate relationship between enhancer-CoR sensitivity and TF motifs with highly distinctive enrichments. Consistent with the conserved role of Rb proteins as part of the DREAM complex, E2F, DP, and Mip120 motifs were enriched in Rbf sensitive enhancers. Interestingly, these three motifs as well as DREF and M1BP motifs were enriched in enhancers that were resistant to CoRest, CtBP and Sin3A repression. On the other hand, motifs for the developmental TFs GATA, AP1 and TEAD were enriched in enhancers sensitive to CoRest, CtBP and Sin3A and resistant to Rbf and Rbf2. Furthermore, Trl and ETS motifs were specifically enriched in enhancers that were resistant to Rbf2 and Rbf respectively, while sensitive to all other CoRs. These results indicate that, for each CoR, certain TF motifs are specifically and strongly enriched in sensitive and resistant enhancers (Jacobs, 2023).

Given the differential motif enrichments, whether TF motif content is predictive of an enhancer's CoR sensitivity was tested. For this, a Generalized Linear Model (GLM) was trained using TF motif counts as features and enhancer sensitivity to a given CoR as response variable. For each CoR, using 10-fold cross-validation, the models based on motif counts performed well and were able to predict an enhancer's sensitivity and resistance to a given CoR. Overall, these results establish a strong association between CoR sensitivity and TF motif content that is predictive and might correspond to causal relationships (Jacobs, 2023). It was noticed that DREF TF motifs were enriched in enhancers that were resistant to CoRest and CtBP. Indeed, enhancers bound by DREF, such as enhancers near the RYBP and jupiter genes, are specifically resistant to repression by CoRest and CtBP but sensitive to repression by Rbf. Ranking all enhancers based on their sensitivity to repression by CoRest or CtBP confirmed that resistant enhancers were significantly enriched for DREF motifs and bound by DREF according to ChIP-seq. Given the correlation between CoRest and CtBP resistance and presence of DREF, it was considered that DREF might protect enhancers against CoRest and CtBP-mediated repression. To test whether DREF motifs are indeed required for the resistance, 65 DREF-motif containing enhancers, mutated the DREF motifs, were tested, and the enhancers' sensitivity to repression was assessed. The mutated enhancers were significantly more sensitive to repression by CoRest and CtBP, while their sensitivity towards Rbf did not change. It is inferred that DREF motifs are required for resistance to CoRest and CtBP. Several other TF motifs were predicted to confer resistance to repression by specific CoRs. ETS- family motifs, for example, were specifically enriched in enhancers that were resistant to Rbf repression was observed. The ETS motifs were mutated in 157 enhancers and indeed increased sensitivity towards Rbf-mediated repression. Remarkably, the opposite trend was observed for the other CoRs, which were all able to repress the wildtype enhancers containing ETS motifs better than their mutated counterparts, suggesting that ETS TFs are in general sensitive to repression. Similarly, Trl motifs were enriched in enhancers that were specifically resistant to Rbf2 repression and mutating these motifs in 127 enhancers led to a slight but specific increase in sensitivity towards Rbf2, but not towards the other CoRs (Jacobs, 2023).

In general, specific TF motifs are required to protect resistant enhancers against a given CoR. As these motifs and enhancers are however still sensitive to other CoRs, an intricate pattern emerges, which suggests that the interplay between repressors and activators can be highly specific (Jacobs, 2023).

Given that specific TF motifs are required for CoR resistance, it was aked whether these motifs are also sufficient to cause CoR resistance. To test this, two 'resistant' TF motifs (10 bp each) were introduced into CoR-sensitive enhancers at positions deemed unimportant for enhancer activity, and the enhancers' sensitivity to repression was evaluated. As a control for the addition of motifs, two neutral control TF motifs were introduced at the same positions (of the ACE2 and FOX TFs), which were predicted to not impact the sensitivity to repression. Since DREF motifs were required for resistance to CoRest and CtBP-mediated repression, whether they were sufficient to render CoRest and CtBP-sensitive enhancers resistant was tested. Overall, the introduction of two DREF motifs was sufficient to protect the enhancers from CoRest and CtBP-mediated silencing and this protection was specific and due to the DREF motifs, as control motifs had no effect on sensitivity. Similarly, ETS motifs were necessary and sufficient to desensitize enhancers from repression by Rbf, and Trl motifs were necessary and sufficient to specifically generate Rbf2-resistant enhancers (Jacobs, 2023).

In each case, the gain in resistance due to the addition of selected TF motifs was highly CoR specific, as these new motifs had little effect on the repression by other CoRs or could even sensitize the enhancers to other CoRs. The addition of ETS or Trl motifs for example made enhancers more sensitive to all other CoRs except for Rbf or Rbf2 respectively, while DREF motifs increased sensitivity towards Rbf. Hence, introducing specific TF motifs changed the enhancer's sensitivity to repression by a given CoR in a specific and predictable manner. Taken together, it is concluded that specific TFs are necessary and sufficient to confer resistance to repression by a given CoR, while other CoRs are able to repress these TFs, indicating that certain TFs can directly counteract some CoRs and their repressive function or that different modes of transcriptional activation require different modes of repression (Jacobs, 2023).

This work has shown that transcriptional enhancers are differentially susceptible to active repression by prominent CoRs. The regulatory specificities between active enhancers and five CoRs were mappedon a genome-wide scale, and it was discovered that enhancers across the genome are repressed by just a subset of specific CoRs. This additional layer of regulatory specificities enables the differential repression of closely spaced enhancers and genes as they frequently occur in the Drosophila genome. At the fs(1)h/mys locus for example, CoRest could repress the intronic mys enhancer without affecting the neighboring fs(1)h gene, whereas Rbf could do the opposite. Similarly, at the kay/fig locus some CoRs like CtBP could repress both closely spaced enhancers, whereas Rbf2 could specifically repress the enhancer closest to fos intronic gene (fig) but not the others (Jacobs, 2023).

The uncovered specificities between repressors and enhancers not only suggest that repression occurs via distinct mechanisms, but also reveal a previously unappreciated layer of 'resistance' against repression. Enhancers that are sensitive or resistant towards a given CoR display clear differences in their TF motif content, and motif mutagenesis experiments changed enhancer sensitivities in a predictable manner. Specific motif combinations of activating and repressive TFs, together with the regulatory proteins present in each cell type, will largely determine when and where a regulatory region is active or repressed. In addition, the ability of TFs to confer resistance also allows for enhancers and genes to be activated by de- repression via TF recruitment, as has been demonstrated by motif addition. This additional layer of regulation circumvents the requirement to remove the involved repressors and the interplay between TFs and CoRs provides more flexibility to the regulatory system. Importantly, the fact that one mode of activation can be sensitive to one repressor but resistant to another implies that distinct activation and repression mechanisms must converge. Active repression might directly interfere with specific factors or defined steps of transcriptional activation. Certain CoRs might for example inactivate specific TFs but not others (e.g. via posttranslational modifications), or counteract the TFs' downstream activating mechanisms. Alternatively, certain TFs might directly counteract a repressor's function or bypass the rate-limiting step of initiation- or elongation controlled by the CoR. Discerning the distinct mechanisms of activation and repression and how they intersect and coordinate will be of great future interest (Jacobs, 2023).

Dynamic interplay between non-coding enhancer transcription and gene activity in development

Non-coding transcription at the intergenic regulatory regions is a prevalent feature of metazoan genomes, but its biological function remains uncertain. This study devised a live-imaging system that permits simultaneous visualization of gene activity along with intergenic non-coding transcription at single-cell resolution in Drosophila. Quantitative image analysis reveals that elongation of RNA polymerase II across the internal core region of enhancers leads to suppression of transcriptional bursting from linked genes. Super-resolution imaging and genome-editing analysis further demonstrate that enhancer transcription antagonizes molecular crowding of transcription factors, thereby interrupting the formation of a transcription hub at the gene locus. This study also showed that a certain class of developmental enhancers is structurally optimized to co-activate gene transcription together with non-coding transcription effectively. It is suggested that enhancer function is flexibly tunable through the modulation of hub formation via surrounding non-coding transcription during development (Hamamoto, 2023).

This study has successfully established a live-imaging system that permits simultaneous visualization of non-coding enhancer transcription together with gene activity at the single-cell resolution in Drosophila. By combining a series of genome-engineering and genetic approaches, this study has provided several lines of evidence that intergenic non-coding transcription can flexibly modulate enhancer function in an orientation- and activity-dependent manner. Super-resolution imaging and genome-editing analysis further demonstrated that enhancer self-transcription impacts molecular crowding of transcription factors and hub formation at the gene locus. It is speculated that elongating Pol II acts as a steric hindrance for the stable association of transcription factors with their cognate enhancers, which may further interrupt subsequent recruitment of co-activators and other key transcriptional apparatus to the gene locus. Consistent with these results, it has been previously reported that non-coding transcription at the intergenic regulatory region suppresses the expression of the SER3 gene by inducing the dissociation of transcription factors from their binding sites in yeast. Transcription of the yeast ADH1 gene is also shown to be suppressed through a similar mechanism during zinc starvation. Similarly, in Drosophila, there are several examples where readthrough non-coding transcription toward the promoter region of nearby genes directly represses their expression. In these cases, readthrough non-coding transcription is thought to physically interfere with the assembly of the pre-initiation complex and subsequent recruitment of Pol II to the promoter region. This so-called 'transcription interference' mechanism is suggested to occur also in mammalian systems. Another well-known example of transcriptional attenuation mediated by non-coding transcription is the regulation of GAL10 and GAL4 genes in yeast system. Under the uninducible condition, non-coding RNA transcribed from the antisense GAL10 region is required for preventing transcriptional leakage of GAL10 and GAL4 genes presumably through a transcriptional interference mechanism. In this regard, the mechanism described in this study is unique in that intergenic non-coding transcription at the distal enhancer region counteracts the assembly of transcription hubs, thereby remotely influencing the efficiency of burst induction from the linked gene in an orientation- and activity-dependent manner. The current data together with these preceding studies implicate that attenuation of gene activity by non-coding enhancer transcription is an ancient mechanism conserved across species. It is speculated that the directionality and/or strength of intergenic non-coding transcription has been, in part, evolutionally defined as a consequence of the natural selection of sequence variations at non-coding genomic regions based on their functional impacts on nearby gene expression and associating physiological consequences in the natural environment. Acquisition of novel intergenic TSSs may also have contributed to the diversification of enhancer function during the process of animal evolution as this study has produced a broad spectrum of regulatory activities from the same enhancer just by engineering the mode of intergenic non-coding transcription. This idea was actually supported by the analysis of endogenous rho locus as its gene activity was dramatically changed just by placing a minimal TSS at the intergenic region (Hamamoto, 2023).

As exemplified in the analysis of Ubx BRE, some enhancers appear to preferentially contain non-inhibitory TSSs facing toward the opposite orientation relative to the transcription factor binding sites. Under its natural enhancer configuration, non-coding transcription from BRE was found to positively correlate with bursting activities of the linked gene, giving rise to the possibility that there are a class of enhancers that utilize non-coding transcription to facilitate gene expression. Super-resolution imaging analysis indicates that enhancer transcription exerts its negative regulatory function by interrupting the formation and the function of the transcription hub at the gene locus when Pol II elongation is engineered to occur toward the transcription factor binding sites in a synthetic locus. Taking this into consideration, it is also conceivable that enhancer transcription exerts its positive regulatory function by facilitating the release of enhancer-bound repressors and/or inhibitory nucleosomes in certain circumstances. Importantly, the data showed that the positive effects of BRE non-coding transcription can be easily lost by placing additional intergenic TSS facing toward the transcription factor binding sites , suggesting that the regulatory outcome of enhancer transcription is highly dependent on its surrounding genomic context. Indeed, unlike natural Ubx BRE, non-coding transcription originating from natural hairy enhancer was shown to exert negative regulatory function in early embryos. Thus, it is speculated that mode of enhancer transcription has been evolutionally defined by the balance between its negative and positive regulatory effects at each genomic locus. It should be noted that Drosophila enhancers are classified into two functionally distinct groups; developmental and housekeeping, and they largely differ in their binding partners at the molecular level. In this regard, these two types of enhancers have been pooled nondiscriminatory, and it has been reported that there is a weak positive correlation (ρ = 0.24) between the level of non-coding transcription and the enhancer strength. Importantly, however, a subsequent study by separately analyzed housekeeping enhancers and developmental enhancers and found that activities of developmental enhancers negatively correlate with the level of enhancer transcription (ρ = −0.20), while housekeeping enhancers exhibit a positive correlation. In agreement with this, the current analysis estimated that developmental enhancers in early embryos are more prone to contain inward TSS. It might be possible that the positive regulatory function of non-coding transcription plays a more important role in the control of housekeeping gene expression in Drosophila (Hamamoto, 2023).

Another key observation in this study is the suppression of transcriptional bursting from the linked gene in the presence of bidirectional TSS nearby from the enhancer region as enhancer transcription typically takes place bidirectionally in mammalian genomes. In this regard, the recent computational analysis suggested that a substantial fraction of mammalian enhancers are depleted of transcription factor binding sites at the origin of non-coding enhancer transcription, implicating that their natural genomic configurations are organized to minimize potential inhibitory effects of non-coding transcription. Clearly, future studies are needed to fully elucidate the biological roles of enhancer transcription by visualizing its regulatory function in the context of the endogenous genome in various species. It is believed that this study will serve as a critical starting point toward a full understanding of multifaced functions of non-coding enhancer transcription in the control of gene expression during animal development (Hamamoto, 2023).

The continuum of Drosophila embryonic development at single-cell resolution

Drosophila melanogaster is a powerful, long-standing model for metazoan development and gene regulation. This study profiled chromatin accessibility in almost 1 million and gene expression in half a million nuclei from overlapping windows spanning the entirety of embryogenesis. Leveraging developmental asynchronicity within embryo collections, deep neural networks were applied to infer the age of each nucleus, resulting in continuous, multimodal views of molecular and cellular transitions in absolute time. Cell lineages were determined; their developmental relationships were infered; and dynamic changes in enhancer usage, transcription factor (TF) expression, and the accessibility of TFs' cognate motifs were linked. With these data, the dynamics of enhancer usage and gene expression can be explored within and across lineages at the scale of minutes, including for precise transitions like zygotic genome activation (Calderon, 2022).

To illustrate the potential of these data to facilitate exploration of specific lineages at finer resolution, 59,012 cells annotated as neuroectoderm were reanalyzed using scRNA data from 6 to 18 hours. This revealed 20 subclusters, including a large group of early cells corresponding to the brain primordium and neural progenitors that express regulators of neurogenesis, such as Notch (N) and Delta (Dl), and neuroblast temporal TFs, such as miranda (mira) and castor (cas). Two additional neural progenitor clusters correspond to sensory progenitors, whereas immature neurons express low levels of both neural progenitor and pan-synaptic genes, including cacophony (cac) and synaptotagmin 1 (syt1). Mature neurons are marked by higher levels of pan- and subtype-specific synaptic genes coupled with low or no expression of earlier developmental genes. Finally, midline cells, consisting of both neurons and glia cluster together, become evident at 6 to 8 hours; using the midline TF single minded (sim) and glial immunoglobulin family member wrapper as markers, it was possible to follow them forward in time as they mature (fig. S7B). It was also follow the maturation of sensory neural progenitors, marked by shaven (sv), from 6 to 16 hours (Calderon, 2022).

To further explore neuronal diversity, 6703 mature neurons were reclustered, revealing 11 neuronal subtypes, which were manually curated. Among these, four clearly separable sensory cell clusters were identified. There are two types of Drosophila sensory neurons based on dendritic morphology: type I sensilla, which include both external sensory (ES) neurons and internal chordotonal (Ch) neurons, and type II multidendritic (MD) neurons. It was possible to clearly distinguish MD neurons on the basis of expression of genes, such as dendritic arbor reduction 1 (dar1), which promotes their characteristic branching dendrites, and the pseudouridine synthase RluA-1, which was recently identified as a marker of MD neurons. Consistent with their nociceptive role, this cluster also specifically expresses the mechanical nociception degenerin/epithelial sodium channel subunits pickpocket (ppk) and ppk26. Mechanosensory ES neurons are specified by the TF hamlet (ham), which is specifically expressed in the middle sensory cluster. The adjacent cluster, likely Ch sensory neurons, is identified by expression of the mechanosensitive nonselective cation channel subunit no mechanoreceptor potential C (nompC) as well as fate-determinant Rfx and a number of as-yet uncharacterized genes specific to this cluster. The final sensory cluster likely corresponds to Ch glial-like support cells based on the expression of glial markers, including moody, and Cbl-associated protein (CAP) and nompA, which promote the development and function of Ch support cells, respectively. On the basis of vesicular neurotransmitter transporter expression, two clusters of central cholinergic neurons were identified, a glutamatergic cluster that likely includes motor neurons, and monoaminergic neurons. Finally, peptidergic neurons cluster separately and were identified on the basis of the expression of neuropeptides [ion transport peptide (ITP)], enzymes involved in their synthesis [amontillado (amon)], and receptors [myosuppressin receptor 1 (MsR1)] (Calderon, 2022).

The expression of uncharacterized long noncoding RNA (lncRNA) CR31451 was validated as enriched in mature neurons as well as two genes, complexin (cpx) and CG4328, identified as enriched in the monoaminergic cluster, which includes midline neurons. This neuronal subtype enrichment is unexpected for cpx, which encodes a presynaptic regulator of synaptic vesicle release, and may point to additional requirements for Cpx in midline monoaminergic neurons. In the course of exploring these fine neuronal subtypes, an unexpected finding was made regarding elav, a classic marker gene for neurons. Specifically, lower-level expression of elav was noticed in clusters annotated as visceral muscle. Performing double fluorescent in situ hybridization with a visceral muscle–specific marker gene (biniou) confirmed this unexpected finding and raises the possibility of a potential previously unknown role of this well-studied gene (Calderon, 2022).

This deeper exploration of the neuroectoderm, validating and extending years of research from many groups, illustrates the depth of information that can be obtained from these data. A more detailed annotation of nonmyogenic mesoderm . A full exploration of all lineages represented in these data will require a community-wide effort by tissue experts (as done in this study for neuronal diversity) (Calderon, 2022).

In addition to delineating developmental trajectories, these data can also capture spatial differences arising during developmental patterning. Previous bulk ATAC-seq on embryo halves has shown variability in the accessibility of enhancers along the anterior-posterior (A-P) axis of the blastoderm embryo. Using label transfer to map anterior or posterior identities from a previous blastoderm dataset onto the 2- to 4-hour data, a positional accessibility skew score was computed for validated enhancers with strict A-P activity. This indicates that accessibility of most A-P enhancers is skewed in the expected anterior or posterior cell group, recapitulating the bulk data. Notably, differences among enhancers of the same gene were identified. For example, in the eve locus, the stripe 1 enhancer has a much stronger skew for anterior accessibility compared with stripe 2, as has also been previously reported. The single-cell data thus capture the biological variability in enhancer accessibility along the A-P axis, extending previous observations. Similarly labels could be transferred from sci-RNA-seq clusters to spatial coordinates from a spatial enhanced resolution omics sequencing (Stereo-seq)–based spatial study of Drosophila embryos at 14 to 16 hours and 16 to 18 hours of development. Using the assigned annotations of tissues from the spatial study, a correspondence was observed with cluster annotations, which again suggests the spatial-relevant variability present in these data (Calderon, 2022).

To further leverage continuous views of unfolding trajectories, the gene regulatory modules active in germ layer–specific development were next explored. Focused was placed on the mesoderm and its derivatives as a complex, well-characterized system that have been studied previously. For this,all cells corresponding to mesoderm-derived cell states were selected, collectively 51,338 (scRNA) and 200,907 (scATAC) profiles across 4 to 20 hours and 2 to 20 hours of inferred developmental age, respectively (Calderon, 2022).

Focusing first on RNA, this study selected the top 2000 most variable genes. After normalizing expression values to be comparable across time, dynamic time warp clustering was used to group genes into four clusters with distinct temporal regulation. These clusters define broad successive waves of gene expression during mesoderm development and notably exhibit similarly ordered waves of chromatin accessibility. Gene pathway enrichment suggests different functional roles for each cluster. Cluster 1 genes are highly expressed from the beginning of mesoderm development (directly after gastrulation; 4 to 9 hours); are enriched for TFs; and likely represent a mixture of genes involved in progenitor cells, mesoderm development, and transcriptional activation. Cluster 2 genes peak at ~9 to 11 hours, during the subdivision of the mesoderm into different muscle primordia and their subsequent specification. This cluster is enriched for genes involved in mesoderm development, including myoblast fusion and myotube differentiation, while losing enrichment for stem cell and self-renewal terms. By contrast, cluster 3 genes (n = 365) initiate expression at ~10 hours and steadily increase to the end of embryogenesis, whereas cluster 4 genes (n = 631) only switch on at ~15 hours, during muscle terminal differentiation. The last cluster lacks enrichment for TFs and rather includes genes involved in myofibril assembly and muscle assembly and maintenance as well as essential contractile proteins for differentiated muscle. The spatiotemporal expression of five poorly characterized genes was validated by in situ hybridization, confirming that they are expressed in the mesoderm or muscle at the inferred time window (Calderon, 2022).

The temporal and cell type–specific nature of these expression signatures for both the downstream effector molecules and their upstream regulators should provide the resolution to order genes into putative regulatory hierarchies. For example, several genes with essential roles in muscle differentiation, such as myosin heavy chain (Mhc), are present in clusters 3 and 4. Mhc protein plays a critical role in providing muscle-contractile force. The scRNA data show increasing Mhc expression along the muscle lineages in cells with later embryonic ages, matching the expression pattern of Mhc. Concomitantly, there is a gradual increase in open chromatin at characterized Mhc enhancers at later stages along multiple muscle trajectories (Calderon, 2022).

Before the expression of Mhc and other muscle differentiation genes, transient expression was observed of mesoderm-associated TFs. One example is Kahuli (Kah), a Snail/Scratch family TF associated with muscle development, which has peak expression at 10 hours of embryogenesis (cluster 2). To investigate the relationship between open chromatin and gene expression, gene activity scores were computed, defined as the sum of sci-ATAC-seq reads in the gene body and the 2 kb flanking the transcription start site (TSS). The gene activity scores for both Mhc and Kah recapitulate their sequential temporal patterns of expression, with Kah's activity signature appearing earlier along the mesodermal trajectories compared with that of Mhc. To determine the extent to which it was possible to map the exact ordering of accessibility and expression changes, the scaled expression values and gene activity scores averaged across bins with equal numbers of cells were overlaid. Notably for Kah, gene expression temporally follows the trajectory of the corresponding gene activity score based on open chromatin, suggesting an ordering where first the gene body becomes accessible followed by accumulating levels of the corresponding transcript; however, this was not the case for Mhc, for which expression and accessibility increased in tandem. Kah binds to several characterized Mhc enhancers near the gene's promoter, as observed in bulk ChIP sequencing (ChIP-seq) data, which suggests a regulatory link between Kah and Mhc expression (Calderon, 2022).

To extend this analysis more globally, TF motifs were sought enriched in putative enhancers (mesoderm-specific scATAC peaks 1 to 10 kb upstream of the TSS) of genes belonging to each of the four scRNA mesoderm expression clusters. This identified 458 TF motif–to-cluster enrichments corresponding with 152 unique TFs. Of these, 31 are TFs whose expression changes along mesoderm differentiation and are thus included in the expression-based clustering. These 31 include many TFs essential for mesoderm development, including a number of direct target genes of the master regulator Twist (the functional ortholog of MyoD) at the beginning of mesoderm development (e.g., hb, en, Ubx, and pb), and concordantly expressed in the first temporal cluster. These factors have many functions, including setting up the segmentation of the mesoderm, regulating the expression of somatic muscle identity genes, establishing midgut constrictions in the visceral mesoderm, and heart patterning. Other examples from the second and third temporal clusters are genes required for cell fate specification of somatic muscle founder cells (e.g., Six4 and ap) and heart development (e.g., tup and Lim3) (Calderon, 2022).

This approach may miss the contribution of important TFs that were not variably expressed in mesoderm. In particular, if a TF is variably expressed and has corresponding variability in motif activity, this TF is likely active. However, this does not imply that all expressed TFs are active (e.g., there may be coactivators or posttranslational modifications that are required). This caveat notwithstanding, these analyses highlight the potential for further discovery of coregulated gene modules related to distinct germ layers or cell types (Calderon, 2022).

Next, an investigation was undertaken about whether it was possible to leverage the diversity of cell states across embryogenesis to infer which TFs drive specific programs of cell type differentiation. For this, all scATAC clusters at all time points (in contrast to the scRNA-focused cluster analysis above) were used and differential enrichment of TF position weight matrices (PWMs) were sought within each cluster's open chromatin regions (Calderon, 2022).

Enrichments across clusters were characterized from the 10- to 12-hour time window based on predicted time. Encouragingly, hierarchical clustering of the enrichment profiles of all associated PWMs grouped each cluster roughly by germ layer (this was also observed in other time windows). The nonmyogenic mesoderm (fat body) and myogenic mesoderm (somatic muscle) cluster together. Open chromatin regions in the myogenic clusters are enriched in motifs for many TFs known to play a role in muscle development, including Mef2 and Fork head (Fkh) TFs. The myogenic clusters also appear close to two neuronal clusters, which is driven by shared motif enrichment with neuroectoderm and glial cells, particularly many C2H2 zinc finger TFs, including Btd, CG7368, Crol, Sr, and Dar1. Many of these factors have known roles in neuronal development (e.g., Dar1), whereas Stripe (Sr) is essential for muscle tendon cell fate and muscle attachment in the epidermis at late stages of embryogenesis (Calderon, 2022).

Because members of the same family of TFs typically recognize similar motif sequences (e.g., GATAe, GATAd, and pnr), it is often difficult from motif analysis alone to pinpoint the responsible TF. To address this, the scRNA data wes leveraged to identify the most likely active TF on the basis of its expression within the clusters among all factors that share the same motif binding pattern. First, a regression-based framework was used to integrate the scATAC and scRNA datasets and identify links between the different cell clusters. Specifically, a nonnegative least square (NNLS) matrix factorization approach was adopted that decomposes expression data as a mixture of components derived from proximal gene activity scores generated from the scATAC data. Despite possible temporal differences between accessibility and expression, NNLS identifies stronger links between clusters from the same 2-hour window compared with those from adjacent 2-hour windows. NNLS links in the opposite direction were also inferred by decomposing proximal gene activity scores by gene expression associated with scRNA clusters. For each cluster of a given data type, the result of NNLS factorization is a mixture proportion of clusters from the other data type, where a higher value represents a stronger association between the scRNA and scATAC cluster. This factor decomposition approach resulted in a strong linkage (NNLS-mixture coefficient of >0.1) of 120 cell state clusters present in the same inferred time windows, with most of the strongly linked clusters being from 4 to 6 hours onward. Upon manual inspection, many linked scATAC and scRNA clusters, which had been independently annotated, are from matching tissues. For example, from the 10- to 12-hour window, the epidermis cluster (cluster 0) in scATAC data was matched to the epidermis in scRNA data. Altogether, of 21 ATAC clusters from the 10- to 12-hour window, 16 had a linked RNA annotation with a NNLS correlation value >0.1, of which 14 were between comparable tissue annotations (Calderon, 2022).

These integrated scRNA and scATAC clusters, which span 0 to 18 hours of embryogenesis, enabled a more direct analysis of the role of specific TFs in different cell types' differentiation. It was reasoned that active TFs should be more highly expressed in cell types for which they have a functional role, and their associated PWM should be more enriched or depleted in accessible regions when the TF is activating or repressing expression. In line with this, correlation values between motif-associated accessibility and gene expression were shifted toward more positive values for TFs annotated [by gene ontology (GO)] as activators and toward more negative values for annotated repressors, a trend also observed in human fetal tissues. This approach of linking TFs' cluster-specific expression and motif enrichments allowed nomination of TFs as active at specific times in specific tissues. For example, this analysis predicts a specific role for Sage in salivary gland development, as the salivary gland is the only cell type exhibiting both high expression of the sage transcript and high accessibility of the Sage-associated PWM. This finding matches the essential role for sage in salivary gland development, as determined by genetic loss-of-function analysis. Similar predictions were made for GATAe in the midgut at 16 to 18 hours and Awh in the epidermis at 14 to 16 hours, matching the functional role for both TFs in midgut endoderm and epidermis development, respectively (Calderon, 2022).

To expand this analysis and systematically nominate TFs that potentially drive germ layer–specific differentiation programs, a linear model was fit that predicts a TF's motif-associated chromatin changes from an estimated effect of an interaction term that includes the expression level of the TF in a specific germ layer and time window. The model's effect estimates can identify TFs with specific motif activity in particular germ layers and suggest time windows from which a TF initiates its activity. For example, the model refined the role of Sage as becoming active in the ectoderm germ layer specifically from 10 to 12 hours onward and the activity of GATAe initiating in the endoderm from 8 to 10 hours onward. Such a model encompassing germ layers across development time may also identify additional likely coactive TFs. For example, in addition to Sage, Fkh was found to be both coexpressed and coactive in the ectoderm-a TF reported to act together with Sage to activate salivary gland–specific genes (Calderon, 2022).

This analysis also generated additional interesting findings for other time points and germ layers [e.g., Fruitless (Fru)]. Altogether, from eight high-level germ layer–associated tissue annotations and 316 TF motifs tested, 1258 significant ( TF-to-tissue relationships having both associated expression and chromatin activity at one or more of the nine time windows assessed. It is noted that in time windows with fewer clusters, the association effect estimates are susceptible to outliers and should be interpreted with caution. Notwithstanding this caveat, these putative assignments represent an extensive resource for future studies (Calderon, 2022).

To demonstrate the potential of this approach to discover previously unknown putative roles for TFs, four genes were selected and whether they were expressed in the linked germ layer was validated by fluorescent in situ hybridization. Although these genes were inferred to have effects in multiple germ layers, their function in either mesoderm (CG5953 and CG11617) or neuroectodermal tissues (Ets65A and CG12605) was poorly characterized. These factors were confirmed to in fact be expressed in the tissue and time window predicted by the data, suggesting potential roles for these TFs in mesoderm and neuronal development (Calderon, 2022).

To complement the NNLS, a recently developed tool, was applied to further facilitate gene regulatory network (GRN) reconstruction. Because multi-omic ATAC-RNA data from the same cell are required for this task, the two independent assays for all cells from 10 to 12 hours were integrated using canonical correlation analysis (CCA), identifying the most likely ATAC-RNA cell pairs using geodesic distance–based pairing (37) within the common CCA space. Using these pairs as input for GRN inference with FigR, ATAC peaks were linked to their target genes based on peak-to-TSS accessibility correlation and then TF motif enrichments were computed for the linked regions, which, together with the TF expression-accessibility correlation, allowed definition of hundreds of putative activators and repressors at this embryonic stage. Ranking the TFs by their regulation score nominated many activators and repressors that were also identified in the NNLS analysis above, including l(3)neo38, Lim3, lola, fkh, and fru. Focusing on the targets of the regulatory networks across all cells at 10 to 12 hours, a large set of genes was found that appear to be extensively regulated (209 genes with >10 linked regulatory regions). The inferred TF activities were used to explore the factors acting on these genes and their mode of regulation. For example, tup, a TF gene required for heart development, undergoes extensive self-regulation (highest motif-RNA correlation) besides being positively regulated by the pan-muscle TF Mef2 and repressed by Run and Opa. Another top-ranking gene, chinmo, an essential neuronal TF, is activated by other nervous system TFs, such as Lim1 and Onecut, and is negatively regulated by Fru, which was also identified as a neuroectoderm-specific repressor in the NNLS-based analysis (Calderon, 2022).

Finally, attempts were made to exploit the fine-grained resolution of inferred nuclear ages to explore the dynamics of an early pioneer TF, Zelda, in regulating chromatin opening followed by transcription during ZGA. The expression of a set of genes was uncovered that are Zelda dependent during ZGA and, for each gene, aggregated accessibility at the linked Zelda-bound regions in intervals of 1 min across 0 to 3 hours of embryogenesis. Clustering of gene expression identified two broad temporal clusters—a first group of early genes and a second group whose expression increases later, after ~1.5 hours of embryogenesis. Notably, although accessibility at the Zelda-bound regions linked to the early cluster seems to mirror the temporal expression, regions linked to the late expression gene cluster gain accessibility much earlier, almost as early as the first cluster, which suggests that Zelda is opening these regions for future activation. To verify whether accessibility is reflective of Zelda binding, Zelda occupancy was retrieved by nuclear cycle, which confirmed that >70% of regions in both temporal clusters are already occupied by Zelda at nuclear cycle 8 to 9, regardless of the associated gene expression. Moreover, a partial Clamp TF motif match was found within the second temporal cluster (and no match for the first cluster of a TF that is also expressed), which corroborates its Zelda-paired role during ZGA. These results suggest that Zelda establishes chromatin accessibility at a large set of regulatory regions in the early embryo, independently of future gene expression, in agreement with its well-known role as a pioneer factor. In some cases, Zelda possibly also functions as the activator of gene expression (cluster 1), whereas in others it retains a pioneering role, and the gene's expression is induced by later TFs (cluster 2) (Calderon, 2022).

The continuum of Drosophila embryogenesis (see Single-cell profiling of chromatin accessibility and gene expression throughout Drosophila embryogenesis.) builds on previous work generating sci-ATAC-seq from three nonoverlapping time windows of embryogenesis and complements other studies performed on specific tissues as well as scRNA from entire embryos at one specific stage or on dissected tissues from adults. Despite the growing use of single-cell assays to generate large-scale atlases, characterizing fine-scale dynamics of chromatin accessibility and gene expression across developmental time remains a challenge. The large number of cell types and even greater number of cell states and branch points during embryogenesis requires extensive cell sampling at continuous stages to capture regulatory transitions, especially for rare cell types. This is very difficult if not essentially impossible to obtain in most model organisms (Calderon, 2022).

In this work, sampling embryo collections from overlapping 2- to 4-hour time windows, coupled with NN-based inference of more precise nuclear ages, enabled continuous representation of Drosophila embryonic development. Other studies have attempted a similar ordering of embryos by developmental time over a 2-day window of mouse development. However, because only dozens rather than thousands of mouse embryos can practically be sampled, reliable inference at the scale of hours or minutes is challenging. Similarly, cell age was inferred in Caenorhabditis elegans using an independent time series of bulk RNA-seq from whole embryos. However, relying on such whole-embryo bulk data to predict developmental age in single cells risks inaccurate aging of rare or transient cell types, especially for more complex organisms (Calderon, 2022).

Computationally, the current neural network-based inference of developmental age bears some similarity to the concept of pseudotime. As originally proposed, pseudotime aims to serve as 'a quantitative measure of progress through a biological process'. Analogously, inferred developmental age tracks the progression of nuclei through development. However, the advantage of pairing an experimental design including overlapping yet tightly defined time windows with temporal ordering is that it is possible to anchor inferred ages to fixed time points, which can potentially lead to a more accurate representation of developmental age for complex cellular trajectories. Put another way, inferred ages are interpretable as units of absolute time that are synchronized across all tissue trajectories. With such a continuum of cellular states, it is possible to begin to infer cell type trajectories that more closely capture the continuous processes of cellular differentiation unfolding within a complex, developing multicellular organism (Calderon, 2022).

There remain further possible improvements to the experimental framework. The alignment or anchoring to real time could be refined with sampling of more tightly staged windows. Multi-omic methods for characterizing multiple data types from the same nuclei may facilitate a joint model that can link paired gene expression and chromatin accessibility (and other modalities) to developmental age inference. There are cases where technical features of the data can lead to increased uncertainty of model predictions. For example, it was found that cells annotated as germ cells, from the first collection time window, or with low read count were associated with greater prediction error. Moving forward, caution is suggested for interpreting findings solely on the basis of inferred nuclear ages from clusters with these features (Calderon, 2022).

The extensive scATAC data, with deep coverage across almost a million cells, likely captured most regulatory elements active during embryonic development and provides a comprehensive resource of potential enhancers for almost any cell type in the embryo. By contrast, the scRNA data had relatively low unique reads per cell and will likely miss some differentially expressed genes in specific cell types. As a result, some delicate analyses remain challenging. For example, transcriptional velocity estimates were found to be unstable with sparse scRNA data, although this issue was mitigated by constructing metacells before velocity analysis , which may be useful for pursuing targeted questions. In scATAC data, it was possible to distinguish XX versus XY nuclei from the proportion of chrX-mapped reads; however, this was challenging for the scRNA data, again as a result of data sparsity. These shortcomings are to some degree compensated by the large number of cells profiled, as shown by the ability to recapitulate aspects of previously documented heterogeneity even for highly dynamic or restricted phenomena—e.g., ZGA (Calderon, 2022).

Overall, this Drosophila embryonic atlas provides broad insights into the orchestration of cellular states during the most dynamic stages in the life cycle of the organism. These results represent a rich resource for understanding precise time points at which genes become active in distinct tissues as well as how chromatin is remodeled across time. The annotation of cell types within these data is an ongoing process and one that is much more challenging at early and mid-stages of embryogenesis as compared with late time points or in adults with differentiated tissues. A comprehensive annotation of embryonic cell states will require a collective effort from the Drosophila community. To support these ongoing efforts, information on expression and peaks are provided from all clusters in addition to all intermediate and raw data for further exploration. Although larval stages remain insufficiently profiled, it is hoped that these data and methods, together with the recently released large-scale adult atlas, bring closer the community-wide goal of a multimodal Drosophila atlas spanning a continuum from zygote to adulthood (Calderon, 2022).

New twists of a TAIL: novel insights into the histone binding properties of a highly conserved PHD finger cluster within the MLR family of H3K4 mono-methyltransferases

Enhancer activation by the MLR family of H3K4 mono-methyltransferases requires proper recognition of histones for the deposition of the mono-methyl mark. MLR proteins contain two clusters of PHD zinc finger domains implicated in chromatin regulation. The second cluster is the most highly conserved, preserved as an ancient three finger functional unit throughout evolution. Studies of the isolated 3rd PHD finger within this cluster suggested specificity for the H4 [aa16-20] tail region. The histone binding properties were determined of the full three PHD finger cluster b module (PHDb) from the Drosophila Cmi protein which revealed unexpected recognition of an extended region of H3. Importantly, the zinc finger spacer separating the first two PHDb fingers from the third is critical for proper alignment and coordination among fingers for maximal histone engagement. Human homologs, MLL3 and MLL4, also show conservation of H3 binding, expanding current views of histone recognition for this class of proteins. Chromatin remodeling was further implicated by the SWI/SNF complex as a possible mechanism for the accessibility of PHDb to globular regions of histone H3 beyond the tail region. These results suggest a two-tail histone recognition mechanism by the conserved PHDb domain involving a flexible hinge to promote interdomain coordination (Zraly, 2023).

How enhancers regulate wavelike gene expression patterns

A key problem in development is to understand how genes turn on or off at the right place and right time during embryogenesis. Such decisions are made by non-coding sequences called 'enhancers.' Much of our models of how enhancers work rely on the assumption that genes are activated de novo as stable domains across embryonic tissues. Such a view has been strengthened by the intensive landmark studies of the early patterning of the anterior-posterior (AP) axis of the Drosophila embryo, where indeed gene expression domains seem to arise more or less stably. However, careful analysis of gene expression patterns in other model systems (including the AP patterning in vertebrates and short-germ insects like the beetle Tribolium castaneum) painted a different, very dynamic view of gene regulation, where genes are oftentimes expressed in a wavelike fashion. How such gene expression waves are mediated at the enhancer level is so far unclear. This study establish the AP patterning of the short-germ beetle Tribolium as a model system to study dynamic and temporal pattern formation at the enhancer level. To that end, an enhancer prediction system was established in Tribolium based on time- and tissue-specific ATAC-seq and an enhancer live reporter system based on MS2 tagging. Using this experimental framework, several Tribolium enhancers of gap and pair-rule genes were discovered, and the spatiotemporal activities of some of them in live embryos was assessed assessed. The data was found to be consistent with a model in which the timing of gene expression during embryonic pattern formation is mediated by a balancing act between enhancers that induce rapid changes in gene expression patterns (that are called 'dynamic enhancers') and enhancers that stabilize gene expression patterns (that are called 'static enhancers'). However, more data is needed for a strong support for this or any other alternative models (Mau, 2023).

REFERENCES

Boija, A., Klein, I. A., Sabari, B. R., Dall'Agnese, A., Coffey, E. L., Zamudio, A. V., Li, C. H., Shrinivas, K., Manteiga, J. C., Hannett, N. M., Abraham, B. J., Afeyan, L. K., Guo, Y. E., Rimel, J. K., Fant, C. B., Schuijers, J., Lee, T. I., Taatjes, D. J. and Young, R. A. (2018). Transcription factors activate genes through the phase-separation capacity of their activation domains. Cell 175(7): 1842-1855. PubMed ID: 30449618

Bravo Gonz&asacute;lez-Blas, C., B., Quan, X. J., Duran-Romana, R., Taskiran, II, Koldere, D., Davie, K., Christiaens, V., Makhzami, S., Hulselmans, G., de Waegeneer, M., Mauduit, D., Poovathingal, S., Aibar, S. and Aerts, S. (2020). Identification of genomic enhancers through spatial integration of single-cell transcriptomics and epigenomics. Mol Syst Biol 16(5): e9438. PubMed ID: 32431014

Brody, T., Yavatkar, A., Kuzin, A. and Odenwald, W. F. (2020). Ultraconserved non-coding DNA within Diptera and Hymenoptera. G3 (Bethesda). PubMed ID: 32601058

Calderon, D., Blecher-Gonen, R., Huang, X., Secchia, S., Kentro, J., Daza, R. M., Martin, B., Dulja, A., Schaub, C., Trapnell, C., Larschan, E., O'Connor-Giles, K. M., Furlong, E. E. M. and Shendure, J. (2022). The continuum of Drosophila embryonic development at single-cell resolution. Science 377(6606): eabn5800. PubMed ID: 35926038

Cho, C. Y. and O'Farrell, P. H. (2023). Stepwise modifications of transcriptional hubs link pioneer factor activity to a burst of transcription. Nat Commun 14(1): 4848. PubMed ID: 37563108

Cubenas-Potts C., Rowley, M. J., Lyu, X., Li, G., Lei, E. P., Corces, V. G. (2017). Different enhancer classes in Drosophila bind distinct architectural proteins and mediate unique chromatin interactions and 3D architecture. Nucleic Acids Res45(4):1714-1730. PubMed ID: 27899590

de Almeida, B. P., Reiter, F., Pagani, M. and Stark, A. (2022). DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 54(5): 613-624. PubMed ID: 35551305 BioArchive

de Almeida, B. P., Schaub, C., Pagani, M., Secchia, S., Furlong, E. E. M., Stark, A. (2023). Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature. PubMed ID: 38086418

Galupa, R., Alvarez-Canales, G., Borst, N. O., Fuqua, T., Gandara, L., Misunou, N., Richter, K., Alves, M. R. P., Karumbi, E., Perkins, M. L., Kocijan, T., Rushlow, C. A. and Crocker, J. (2023). Enhancer architecture and chromatin accessibility constrain phenotypic space during Drosophila development. Dev Cell 58(1): 51-62. PubMed ID: 36626871

Gisselbrecht, S. S., Barrera, L. A., Porsch, M., Aboukhalil, A., Estep, P. W., Vedenko, A., Palagi, A., Kim, Y., Zhu, X., Busser, B. W., Gamble, C. E., Iagovitina, A., Singhania, A., Michelson, A. M. and Bulyk, M. L. (2013). Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nat Methods 10(8): 774-780. PubMed ID: 23852450

Hafer T. L., Patra, S., Tagami, D., Kohwi, M. (2022). Enhancer of trithorax/polycomb, Corto, regulates timing of hunchback gene relocation and competence in Drosophila neuroblasts. Neural Dev17(1):3. PubMed ID: 36805453

Hsu, J.-Y., et al. (2008). TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-dependent versus TATA-dependent transcription. Genes Dev. 22: 2353-2358. PubMed Citation: 18703680

Jacobs, J., Pagani, M., Wenzl, C. and Stark, A. (2023). Widespread regulatory specificities between transcriptional co-repressors and enhancers in Drosophila. Science 381(6654): 198-204. PubMed ID: 37440660

Kundu M., Kuzin A., Lin T. Y., Lee C. H., Brody T. et al., (2013). Cis-regulatory complexity within a large non-coding region in the Drosophila genome. PLoS One 8: e60137 10.137. PubMed ID: 23613719

Lebedeva, L. A., et al. (2005). Occupancy of the Drosophila hsp70 promoter by a subset of basal transcription factors diminishes upon transcriptional activation. Proc. Natl. Acad. Sci. 102(50): 18087-92. PubMed citation: 16330756

Levo, M., Raimundo, J., Bing, X. Y., Sisco, Z., Batut, P. J., Ryabichko, S., Gregor, T. and Levine, M. S. (2022). Transcriptional coupling of distant regulatory genes in living embryos. Nature 605(7911): 754-760. PubMed ID: 35508662

Li, Y., Kimura, T., Huyck, R. W., Laity, J. H. and Andrews, G. K. (2008). Zinc-induced formation of a coactivator complex containing the zinc-sensing transcription factor MTF-1, p300/CBP, and Sp1. Mol Cell Biol28(13):4275-4284. PubMed ID: 34266978

Marr, M. T., Isogai, Y., Wright, K. J. and Tjian, R. (2006). Coactivator cross-talk specifies transcriptional output. Genes Dev. 20(11): 1458-69. 16751183

Mau, C., Rudolf, H., Strobl, F., Schmid, B., Regensburger, T., Palmisano, R., Stelzer, E. H. K., Taher, L. and El-Sherif, E. (2023). How enhancers regulate wavelike gene expression patterns. Elife 12. PubMed ID: 37432987

Mikhaylichenko, O., Bondarenko, V., Harnett, D., Schor, I. E., Males, M., Viales, R. R. and Furlong, E. E. M. (2018). The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription. Genes Dev 32(1):42-57. PubMed ID: 29378788

Moudgil, A., Sobti, R. C. and Kaur, T. (2023). In-silico identification and comparison of transcription factor binding sites cluster in anterior-posterior patterning genes in Drosophila melanogaster and Tribolium castaneum. PLoS One 18(8): e0290035. PubMed ID: 37590227

Nowling, R. J., Njoya, K., Peters, J. G., Riehle, M. M. (2023). Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique. Frontiers in cellular and infection microbiology, 13:1182567 PubMed ID: 37600946

Pagani, M., Stark, A. (2022). DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet, 54(5):613-624 PubMed ID: 35551305

Postika, N., Metzler, M., Affolter, M., Muller, M., Schedl, P., Georgiev, P. and Kyrchanova, O. (2018). Boundaries mediate long-distance interactions between enhancers and promoters in the Drosophila Bithorax complex. PLoS Genet 14(12): e1007702. PubMed ID: 30540750

Punzi, G., Ursini, G., Chen, Q., Radulescu, E., Tao, R., Huu Qi, Z., Jung, C., Bandilla, P., Ludwig, C., Heron, M., Sophie Kiesel, A., Museridze, M., Philippou-Massier, J., Nikolov, M., Renna Max Schnepf, A., Unnerstall, U., Ceolin, S., Muhlig, B., Gompel, N., Soeding, J. and Gaul, U. (2022). Large-scale analysis of Drosophila core promoter function using synthetic promoters. Mol Syst Biol 18(2): e9816. PubMed ID: 35156763

Rao, S., Ahmad, K. and Ramachandran, S. (2021). Cooperative binding between distant transcription factors is a hallmark of active enhancers. Mol Cell. PubMed ID: 33705711

Reddington, J. P., Garfield, D. A., Sigalova, O. M., Karabacak Calviello, A., Marco-Ferreres, R., Girardot, C., Viales, R. R., Degner, J. F., Ohler, U. and Furlong, E. E. M. (2020). Lineage-Resolved Enhancer and Promoter Usage during a Time Course of Embryogenesis. Dev Cell. PubMed ID: 33171098

Rajpurkar, A. R., Mateo, L. J., Murphy, S. E. and Boettiger, A. N. (2021). Deep learning connects DNA traces to transcription to reveal predictive features beyond enhancer-promoter contact. Nat Commun 12(1): 3423. PubMed ID: 34103507

Reiter, F., de Almeida, B. P. and Stark, A. (2023). Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Res 33(3): 346-358. PubMed ID: 36941077

Ross, J., Kuzin, A., Brody, T. and Odenwald, W. F. (2015). cis-regulatory analysis of the Drosophila pdm locus reveals a diversity of neural enhancers. BMC Genomics 16: 700. PubMed ID: 26377945

Ross, J., Kuzin, A., Brody, T. and Odenwald, W. F. (2018). Mutational analysis of a Drosophila neuroblast enhancer governing nubbin expression during CNS development. Genesis 56(8):e23237. PubMed ID: 30005136

Sabari, B. R., Dall'Agnese, A., Boija, A., Klein, I. A., Coffey, E. L., Shrinivas, K., Abraham, B. J., Hannett, N. M., Zamudio, A. V., Manteiga, J. C., Li, C. H., Guo, Y. E., Day, D. S., Schuijers, J., Vasile, E., Malik, S., Hnisz, D., Lee, T. I., Cisse, II, Roeder, R. G., Sharp, P. A., Chakraborty, A. K. and Young, R. A. (2018). Coactivator condensation at super-enhancers links phase separation and gene control. Science361(6400). PubMed ID: 29930091

Sloutskin, A., Itzhak, D., Vogler, G., Ideses, D., Alter, H., Shachar, H., Doniger, T., Frasch, M., Bodmer, R., Duttke, S. H. and Juven-Gershon, T. (2023). A single DPE core promoter motif contributes to in vivo transcriptional regulation and affects cardiac function. bioRxiv. PubMed ID: 37398300

Zhao, J., Perkins, M. L., Norstad, M. and Garcia, H. G. (2023). A bistable autoregulatory module in the developing embryo commits cells to binary expression fates. Curr Biol 33(14): 2851-2864 PubMed ID: 37453424

Zraly, C. B., Schultz, R., Diaz, M. O., Dingwall, A. K. (2023). New twists of a TAIL: novel insights into the histone binding properties of a highly conserved PHD finger cluster within the MLR family of H3K4 mono-methyltransferases. Nucleic Acids Res, 51(18):9672-9689 PubMed ID: 37638761

Zygotically transcribed genes

The Interactive Fly resides on the
Society for Developmental Biology's Web server.