Drosophila genes listed by biochemical function |
A genomewide survey of basic helix-loop-helix factors
A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors
A genomewide survey of basic helix-loop-helix factors The information contained in the recently published genomic sequence of Drosophila melanogaster was used to identify
12 additional bHLH proteins. By sequence analysis these proteins have been assigned to families defined by Atonal, Hairy-Enhancer of
Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domains. In addition, one single protein represents a unique
family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue
types but are particularly concentrated in the developing nervous system and mesoderm (Moore, 2000).
Two newly identified genes, CG8667 (Mistr) and CG5545 (Doli), both members of the Ato-related family, are expressed in the developing nervous system. CG5545 is closely related to
the vertebrate repressor Beta 3 protein (96% sequence identity between fly and vertebrate proteins in the bHLH domain). It is suggested that this protein should be named Doli (Drosophila Olig family) -- the Olig proteins are involved in oligodendritic precursor
formation. CG8667 has closest sequence identity to the vertebrate
Mist1 protein, a negative regulatory factor of MyoD activity (78% identical over the entire bHLH domain and 92% identical in the basic domain alone). It is
proposed that this protein should be named Mistr (Mist 1-related protein).
Sequence homology between species does not always imply
functional homology. For example, CG8667/Mistr is a
Drosophila sequence ortholog of the mammalian Mist1 protein.
It is expressed solely in the developing nervous system, whereas
Mist1 is expressed not in the nervous system but in gut,
pancreas, submandibular gland, lung, and skeletal muscle. In this
case, differences in the expression pattern of the genes encoding these
proteins argue against any conservation of developmental role (Moore, 2000).
As with the other proteins of the Ato-related family, the genes encoding these proteins are expressed in the developing Drosophila nervous system. CG5545/doli is
expressed first in a subset of cells in both the ventral nerve cord (VNC) and the procephalic region at stage 9. The number of cells in these regions expressing the
gene increases to a peak at stage 11. By stage 14, levels of expression have fallen such that CG5545/doli is expressed only in a few cells per
hemisegment on the ventral surface of the VNC (Moore, 2000).
There is a strong maternal contribution of CG8667/mistr mRNA. Zygotic transcription is initiated at stage 14. It is expressed in bilateral domains in the cephalic
region, which, as development proceeds, fuse into a U shape forming part of the ring gland. Concomitant expression of CG8667/mistr also begins in the
CNS. By stage 17, CG8667/mistr is in clusters of cells at the anterior and posterior of the VNC and bilaterally in two lateral cells per hemisegment in the VNC (Moore, 2000).
CG10066 (Fer1), CG5952 (Fer2), and CG6913 (Fer3) are related to mammalian p48. These three new bHLH proteins are most closely related to the bHLH
domain of the p48 subunit of PTF1, a pancreatic, exocrine cell-specific transcription factor in the mouse, and represent a new bHLH family in Drosophila. These proteins have been named Fer for 48 related. CG10066/Fer1 is 88%, CG5952/Fer2 is 76%, and CG6913/Fer3 is 62% identical to p48 in the bHLH region (Moore, 2000).
CG10066/Fer1 is expressed in the epidermis at the stage when the epidermis begins to secrete cuticle and, therefore, may share a common function with p48 in active exocrine
cells. It is first transcribed in the epidermal pads adjacent to the posterior spiracles at stage 15. The expression of this gene quickly spreads over the entire epidermal
surface of the embryo and is strongest in epidermis underlying the forming denticle belts (Moore, 2000).
CG5952/Fer2 shows a strong maternal contribution of mRNA in the early embryo. Zygotic expression of this gene begins at stage 10 in an anterior-to-posterior
wave in the VNC and the brain. As development proceeds, the number of CG5952/Fer2-positive cells increases, so that by stage 12, the expression domain forms a
bilateral, dorsal-posterior, crescent-shaped structure (Moore, 2000).
CG6913/Fer3 is expressed at stage 11 in part of the posterior midgut primordia and stage 12 in part of the anterior midgut primordia. At later stages, expression has been detected in several unidentified cells scattered throughout the embryo (Moore, 2000).
CG10446 (Side) and CG5927 (Her) are in the HES family. CG10446 is most closely related to Deadpan (76% identity in the basic bHLH domain and 62% in the entire bHLH
domain). This protein has been named Side (similar to Deadpan). CG5927 is most closely related to the proteins of the Enhancer of split [E(spl)] complex, such as HLHmgamma (76% identity in the basic domain and 51% identity in the entire bHLH domain). CG5927 has been named Her (HES-related). Hairy, Dpn, and the
proteins of the E(spl) complex have WRPW at the very C terminus to mediate interaction with Groucho. CG5927/Her and CG10446/Side also end in this motif.
All members of the
HES proteins mediate transcription repression via their interaction
with Groucho. CG10446/Side and CG5952/Her have the WRPW domain
required for this interaction, implying that they are highly likely to
act via the same mechanism. CG10446/side is expressed
solely in the CNS at a stage at which cell differentiation is
occurring. It is hypothesized that it may play a role in antagonizing the
function of transcription factors involved in the later stages of CNS differentiation (Moore, 2000).
There is a strong maternal contribution of CG10446/side mRNA. Zygotic transcription of the gene begins at stage 12 in a subset of cells in the CNS.
CG5927/her has a low level of maternal mRNA contribution and then is expressed ubiquitously throughout embryogenesis (Moore, 2000).
CG12952 (Sage) is distantly related to the Mesp family and is expressed in the salivary gland. CG12952 represents a protein with little sequence
similarity to other known proteins. In the neighbor-joining tree, it is placed in the same family as the vertebrate Mesp proteins, which are necessary for mesoderm
segmentation initiation (53% identity in the bHLH domains). CG12952 has a strong maternal mRNA contribution in early embryogenesis. Its zygotic
expression begins in the salivary gland anlage at stage 10 and persists until stage 15. CG12592 has been named Sage (salivary gland-expressed bHLH) (Moore, 2000).
CG17592 (Dm Usf) is the ortholog of the mammalian USF proteins. CG17592 is the single Drosophila sequence homolog of the USF
proteins that are involved in cell proliferation control (92% identical in the basic domain). This protein has been named Dm Usf. Both vertebrate and
Drosophila USF are bHLH-zip proteins. Dm Usf has a loop and a second helix region, high in serines, which is greatly diverged from that of mouse and human and,
hence, may have lost its ability to dimerize. There is a weak maternal contribution of Dm usf mRNA. At stage 7, Dm usf is expressed in bilateral domains in
the ventral cephalic furrow. In later stages (15 onward) of development, Dm usf expression is confined to the proventriculus and a subset of cells in the CNS. This specific expression pattern differs from the ubiquitous USF expression pattern reported in vertebrates (Moore, 2000).
CG6211 (Gce) is closely related to the bHLH-PAS Rst(1)JH protein (78%
identity in the bHLH, 68% in the PAS-A, and 86% in the PAS-B domains). Rst(1)JH originally was isolated in a screen to find a Drosophila protein resistant to the
Juvenile Hormone Analog insecticide Methoprene. CG6211 transcript is expressed strongly as a maternally supplied message and then later in a subset of the germ
cells of the developing embryo. It is suggested that this protein should be named Gce (germ cell-expressed bHLH-PAS) (Moore, 2000).
CG11450 (shout) is expressed during mesoderm formation and in myoblasts. CG11450 represents a member of a new bHLH family. It
is expressed first in the dorsal and ventral cellular blastoderm. In the ventral region of the embryo, the gene is expressed continually in the presumptive mesoderm
throughout gastrulation and then in a segmented pattern in the ventral mesoderm layer at the extended germ-band stage. It is expressed in the myoblast cells that then
migrate dorsally from this layer. The expression pattern of CG11450 overlaps with that of the bHLH transcription factor Twist, suggesting that it may be
playing a role in the same mesoderm specification and myogenic pathways; therefore, this gene has been termed shout after "Twist and Shout" by John Lennon and Paul McCartney (Moore, 2000).
The expression domain of CG11450/shout overlaps with that of twist. twist and
CG11450/shout continue to be expressed in the presumptive
mesoderm during gastrulation. At the extended germ-band stage, both
twist and CG11450/shout are expressed in
alternating high and low levels along the length of the mesoderm. These
alternating expression levels of twist are required for the
specification of muscle derived from this tissue. The pattern of
CG11450/shout expression in the ventral mesoderm implies
that it could have a similar role to twist in specification
of mesoderm derivatives. In Drosophila, Twist activates
Snail and other downstream, mesoderm-specific regulators such as
Tinman, Bagpipe, and Mef2; all of these proteins have vertebrate orthologs
implicated in mesoderm development. Hence, CG11450/Shout
represents a good candidate for both sequence and function conservation
across species (Moore, 2000).
CG18144 (Dm Hand) is the Drosophila ortholog of the vertebrate hand proteins. CG18144 is 76% identical to dHand and 69% homologous in the bHLH domain to eHand; both vertebrate proteins are involved in heart formation. Dm hand expression begins at
stage 10 of embryonic development in bilateral stripes in the ventral mesoderm. It continues to be expressed in two tissues derived from this mesoderm, the dorsal
vessel (heart) and the circular visceral musculature. In addition, at stage 13 Dm hand mRNA appears in a small subset of cells in the CNS (Moore, 2000).
Differences in expression, protein interactions, and DNA binding of paralogous transcription factors ('TF parameters') are thought to be important determinants of regulatory and biological specificity. However, both the extent of TF divergence and the relative contribution of individual TF parameters remain undetermined. This study comprehensively identify dimerization partners, spatiotemporal expression patterns, and DNA-binding specificities for the C. elegans bHLH family of TFs, and these data were modeled into an integrated network. This network displays both specificity and promiscuity, as some bHLH proteins, DNA sequences, and tissues are highly connected, whereas others are not. By comparing all bHLH TFs, extensive divergence was found and all three parameters contribute equally to bHLH divergence. This approach provides a framework for examining divergence for other protein families in C. elegans and in other complex multicellular organisms, including humans. Cross-species comparisons of integrated networks may provide further insights into molecular features underlying protein family evolution. A video summary of this article is available online (Grove, 2009).
Specific DNA binding in protein-binding microarray-derived 8-mer data span the full affinity range of DNA binding preferences. Enrichment scores (ESs) were calculated from the PBM signal intensities for all possible 8-mers, and for each bHLH dimer that yielded sequence-specific DNA binding, and position weight matrices (PWMs) were derived for each dimer. A conservative threshold was imposed to identify significantly bound 8-mers. Both the dimers and the 8-mers were hierarchically clustered and it was found that the bHLH proteins can be grouped into two clusters corresponding to different bHLH classes: Cluster I contains HLH-2 (similar to Drosophila Daughterless) and its partners, HLH-1 and HLH-11, and cluster II contains class III, IV, and VI bHLH proteins (Grove, 2009).
As expected, HLH-2-containing dimers (cluster I) exhibit a strong preference for E-box sequences (CANNTG). Surprisingly, however, cluster II dimers, in addition to binding a few E-boxes, also bind multiple non-E-box sequences. These resemble E-boxes, but contain a C or A in the fifth position and a G or T in the sixth position of the binding site (CAYRMK). These 'E-box-like sequences' include the reported CACGCG binding site of Drosophila Hairy, and N-boxes (CACNAG), which are bound by Drosophila Enhancer of Split (Grove, 2009).
The statistical significance was determined of the preference of each bHLH dimer for E-box and E-box-like sequences as compared to all other 8-mers. Neither HLH-2 nor HLH-10 alone can bind significantly to any E-box or E-box-like sequence. However, when combined, they can bind five different sequences. The bHLH DNA binding network also displays degrees of specificity and promiscuity. For instance, only HLH-1 homodimers can bind CAA-containing E-boxes. Some E-boxes and E-box-like sequences are preferred by relatively few dimers, whereas others are bound by many dimers. For example, CACATG is bound by only four dimers, but CACCTG is bound by ten distinct dimers. Conversely, some bHLH dimers bind few E-boxes or E-box-like sequences whereas others bind many: HLH-30 binds only CACGTG, but HLH-2/HLH-10 binds five different E-boxes. This demonstrates that there is specificity and promiscuity in the bHLH DNA binding network, both from the view of the proteins and at the level of their DNA binding sequences (Grove, 2009).
The PBM ES of a particular DNA sequence bound by a dimer is a reflection of relative DNA binding affinities. It was noticed that the ES distribution for 8-mers corresponding to a particular dimer/sequence combination varied greatly. For instance, both HLH-26 and MDL-1/MXL-1 bind CACGTG E-boxes, but HLH-26 does so with a broad ES range and MDL-1/MXL-1 with a very narrow ES range. This suggests that, in contrast to MDL-1/MXL-1, not all CACGTG E-boxes are bound equally well by HLH-26. The possibility is considered that differences may be due to effects of nucleotides flanking the core CACGTG E-box. Indeed, flanking nucleotides have been reported previously to contribute to bHLH dimer DNA binding. However, the effects of nucleotides flanking the E-box and E-box-like sequences had not been analyzed systematically for most bHLH TFs. Since each bHLH monomer may directly contact the flanking nucleotide immediately 5' of the E-box, the influence of this position on relative DNA binding preferences was examined. It was found that for the MDL-1/MXL-1 dimer each of the four possible nucleotides flanking the CACGTG core sequence is recognized approximately equally well; the enrichment score for each relevant 8-mer is between 0.49 and 0.50. However, HLH-26 exhibits a strong preference for a 5' A or G (median 8-mer ES > 0.40), and disfavors a 5' T (median 8-mer ES < 0.10) and, to a lesser extent, a 5' C (Grove, 2009).
Most bHLH proteins exhibit preferences at the 5' flanking nucleotide position and most dimers disfavor a 5' T; this observation is similar to what has been reported for the yeast bHLH homodimer Pho4p. However, there are exceptions: HLH-11 and MDL-1/MXL-1 heterodimer both tolerate a 5' T, and HLH-30 actually favors a 5' T (Grove, 2009).
In summary, both prominent and subtle differences in E-box or E-box-like sequence recognition and flanking site preferences were detected between different bHLH dimers, which likely contribute to target site selection and gene regulation in vivo (Grove, 2009).
Grove, C. A., et al. (2009). A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138(2): 314-27. PubMed ID: 19632181
Moore, A. W., et al. (2000). A genomewide survey of basic helix-loop-helix factors in Drosophila. Proc. Natl. Acad. Sci. 97: 10436-10441. PubMed ID: 10973473
Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.
The Interactive Fly resides on the
Society for Developmental Biology's Web server.