logo Drosophila genes listed by biochemical function
Hox (Homeobox) transcription factors

Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences

Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites


A - D || E - N || O - Z


abdominal A
homeodomain - Antennapedia class

Abdominal B
homeodomain - bithorax complex

achintya
homeodomain transcription factor (TGIF subclass) - required, along with homeodomain protein Vismay, for spermatogenesis

Antennapedia
homeodomain - Antennapedia class

apterous
homeodomain - lim domain

araucan
homeodomain Pbx class

aristaless
homeodomain - paired-like

Arrowhead
LIM domains and LIM homeodomain

bagpipe
homeodomain - NK-2 class

BarH1 & BarH2
homeodomain

bicoid
homeodomain

brain-specific homeobox
homeodomain transcription factor - confers neural identity in specific neurons of medulla and lamina of the optic lobe

buttonless
homeodomain

caudal
homeodomain

caupolican
homeodomain Pbx class

C15 (common alternative name Clawless)
member of the 93E cluster of homeodomain proteins - regulates spatial patterning of the tarsus, a distal portion of the leg -
homolog of vertebrate oncogene Hox11

Chx1 and Chx2 (preferred names: Visual system homeobox 1 ortholog and Visual system homeobox 2 ortholog)
homeodomain transcription factors - markers for the brain central neuroendocrine system termed the pars intercerebralis
that expresses the hormones Drosophila insulin-like peptide (Dilp), FMRF, and myomodulin

cut
homeodomain - cut domain

Dbx
homeobox gene - contributes to the development of subsets of interneurons via cross-repressive, lineage-specific interactions
with the motoneuron-promoting factors eve and exex

defective proventriculus
homeodomain

Deformed
homeodomain - Antennapedia class

Distal-less
homeodomain

drifter (preferred name: ventral veinless)
homeodomain - pou domain

empty spiracles
homeodomain

engrailed
homeodomain - engrailed class - segment polarity gene

even-skipped
homeodomain - pair rule gene

extradenticle
homeodomain - Pbx class

extra-extra
a homeodomain transcription factor - regulates motorneuron cell fate by restricting expression of Even-skipped and Lim2

eyegone
homeodomain & paired domain (paired box)

eyeless
homeodomain & paired domain (paired box)

fushi tarazu
homeodomain - Antennapedia class - pair rule gene

gooseberry-proximal (common alternative name: gooseberry-neuro)
homeodomain - paired domain (paired box)

gooseberry-distal (common alternative name: gooseberry)
homeodomain - paired domain (paired box)

Goosecoid
homeodomain - paired-like

HGTX
homeobox, NK decapeptide domain transcription factor - acts within a subclass of early born neurons to link
neuronal subtype identity to neuronal morphology and connectivity

homeobrain
Paired-like homeobox transcription factor - mutants are embryonic lethal and characterized by a reduction in the anterior protocerebrum,
including the mushroom bodies, and a loss of the supraoesophageal brain commissure - in larvae expressed in all type II lineages and the optic
lobes including the medulla and lobula plug - mutants are characterized by a reduction of the protocerebrum, a loss of the supraesophageal commissure
and mushroom body progenitors and also by a dislocation of the optic lobes - Homeobrain define middle-aged and late intermediate neural
progenitor temporal windows and play a role in cellular longevity - Homeobrain has conserved functions as temporal factors in the developing visual system

homothorax
homeodomain - HM domain

intermediate neuroblasts defective
homeodomain protein

invected
homeodomain - engrailed class

Ipou (preferred name: Abnormal chemosensory jump 6)
homeodomain and POU domain

islet (preferred name: tailup)
homeodomain and LIM domain

labial
homeodomain - Antennapedia class

ladybird early and ladybird late
transcription factors - homeodomain proteins

lateral muscles scarcer
homeodomain transcription factor - identity factor for lateral transverse muscles

Lim1
Lim domain and lim homeodomain

mirror
homeodomain - Pbx class

muscle segment homeobox-1
homeodomain

muscle segment homeobox 2 (preferred name: tinman)
homeodomain - NK-2 class

NK1 (preferred name: Slouch)
homeodomain - NK-1 class

NK2 (preferred name: ventral nervous system defective)
homeodomain - NK2 class

Nkx6 (preferred name: HGTX)
homeobox, NK decapeptide domain transcription factor - acts within a subclass of early born neurons to link
neuronal subtype identity to neuronal morphology and connectivity

onecut
homeodomain and cut domain

Optix
homeodomain and Six domain

orthodenticle
homeodomain - paired-like

orthopedia
homeodomain transcription factor - involved hindgut development of Drosophila, downstream factor of branchyenteron

paired
homeodomain - paired domain (paired box)

POU domain protein 1 (common alternative name: pdm-1)
homeodomain - pou domain

POU domain protein 2 (common alternative name: pdm-2)
homeodomain - pou domain

pou domain motif 3
Pou domain transcription factor required for odor response in a class of olfactory receptor neurons

proboscipedia
homeodomain - Antennapedia class

prospero
novel homeodomain

Ptx1
paired-like homeobox transcription factor - defines enteroendocrine cells (EEs) in the intestinal epithelium - functions in the midgut in global and
regional interstitial stem cell regulation - regulates development of early mesoderm< - differentiates posterior from anterior lateral mesoderm

PvuII-PstI homology 13
homeodomain transcription expressed in the developing eye - required for rhabdomere morphogenesis and proper detection of light

reversed polarity
homeodomain

rough
homeodomain

Rx
homeodomain transcription factor - required for regulation of genes involved in brain morphogenesis

s59 (preferred name: Slouch)
homeodomain - NK-1 class

scarecrow
homeodomain transcription factor - optic lobe - pharyngeal primordia - central nervous system - brain - regulates Pdf neuropeptide expression controlling circadian rhythms

Sex combs reduced
homeodomain - Antennapedia class

Six4
homeodomain transcription factor - confers ventral mesodermal cell fate - regulates somatic cell function during gonadogenesis

shaven (common alternative name: sparkling)
paired domain and homeodomain (partial) - Pax2, 5 and 8 homolog

sine oculis
homeodomain

slouch (common alternative names: S59 and NK-1)
transcription factor - homeodomain - NK-1 class - maintenance of slouch is directly involved in the control of late aspects of muscle development,
such as muscle differentiation and morphogenesis, and possibly also innervation

sparkling (preferred name: shaven)
paired domain and homeodomain (partial) - Pax2, 5 and 8 homolog

tailup (common alternative name: islet)
homeodomain and LIM domain

tinman (common alternative name: NK-4 and msh-2)
homeodomain - NK-2 class

Ultrabithorax
homeodomain - Antennapedia class

unc-4
homeodomain transcription factor - functions during post-embryonic development of the adult CNS to promote cholinergic neurotransmitter identity and
suppress the GABA fate in one larval neuroblast lineage - promotes proper neuronal projections to the leg neuropil and a specific flight-related
take-off behavior in a second larval lineage - acts peripherally to promote proprioceptive sensory organ development and the execution of specific leg-related behaviors

unplugged
homeodomain protein

ventral nervous system defective (common alternative name: vnd or NK2)
homeodomain - NK-2 class

ventral veinless (common alternative name: drifter)
homeodomain - pou domain

vismay
homeodomain transcription factor (TGIF subclass) - required, along with homeodomain protein Achintya, for spermatogenesis

Visual system homeobox 1 ortholog and Visual system homeobox 2 ortholog (common alternative names: Chx1 and Chx2)
homeodomain transcription factors - markers for the brain central neuroendocrine system termed the pars intercerebralis
that expresses the hormones Drosophila insulin-like peptide (Dilp), FMRF, and myomodulin

zerknüllt
homeodomain - Antennapedia class - DV polarity

Zn finger homeodomain 1
zinc finger domain and homeodomain protein - mutation results in various degrees of local errors in mesodermal cell fate or positioning

Zn finger homeodomain 2
transcription factor - zinc finger domain and homeodomain - required for correct proximal wing development


Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. This study determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. A computational system was developed that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and full 8-mer binding profiles were inferred for the majority of known animal homeodomains. The results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success (Berger, 2008).

It was asked whether the homeodomain monomer binding preferences identified in vitro reflect sequences preferred in vivo. Anecdotally, the highest predicted binding sequences do correspond to known in vivo binding sites. For example, in the predicted 8-mer profile for sea urchin Otx, a previously identified in vivo binding sequence (TAATCC, from the Spec2a RSR enhancer), is contained in the top predicted 8-mer sequence, and, more strikingly, it is embedded in the fifth-highest predicted 8-mer sequence (TTAATCCT). At greater evolutionary distance, three of the four Drosophila Tinman binding sites in the minimal Hand cardiac and hematopoietic (HCH) enhancer are contained within the second (TCAAGTGG), fifth (ACCACTTA), and ninth (GCACTTAA) ranked 8-mers (the fourth overlaps the 428th ranked 8-mer [CAATTGAG], but also overlaps with a GATA binding site and may have constraints on its sequence in addition to binding Tinman) (Berger, 2008).

To ask more generally whether occupied sites in vivo contain sequences preferred in vitro, six ChIP-chip or ChIP-seq data sets in the literature were examined that involved immunoprecipitation of homeodomain proteins that were analyzed, or homologs of proteins analyzed that shared at least 14 of the 15 DNA-contacting amino acids. In all cases, enrichment was observed for monomer binding sites in the neighborhood of the bound fragments, with a peak at the center. Two examples, Drosophila Caudal and human Tcf1/Hnf1 are shown. For Caudal, the size of this ratio peak increased dramatically with E score cutoff, indicating that the most preferred in vitro monomer binding sequences correspond to the most enriched in vivo binding sites (51% of bound fragments have such an 8-mer, versus 17% in randomly selected fragments). For Tcf1/Hnf1, however, the majority of sequences bound in vivo do not contain the best in vitro binding sequences, although most do contain at least one 8-mer with E > 0.45 (53%, versus 27% in random fragments), suggesting utilization of weaker binding sites. Similar results were obtained with PWMs. Thus, the requirement for highest-affinity binding sequences may vary among homeodomain proteins, species, or under different physiological contexts. Nonetheless, a large proportion of the in vivo binding events apparently involve the monomeric homeodomain sequence preferences, which can be derived in vitro (Berger, 2008).

Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites

The comprehensive characterization of homeodomain DNA-binding specificities is described for Drosophila melanogaster. The analysis of all 84 independent homeodomains from Drosophila reveals the breadth of DNA sequences that can be specified by the homeodomain. The majority of these factors can be organized into 11 different specificity groups, where the preferred recognition sequence between these groups can differ at up to four of the six core recognition positions. Analysis of the recognition motifs within these groups led to a catalog of common specificity determinants that may cooperate or compete to define the binding site preference. With these recognition principles, a homeodomain can be reengineered to create factors where its specificity is altered at the majority of recognition positions. This resource also allows prediction of homeodomain specificities from other organisms, which is demonstrated by the prediction and analysis of human homeodomain specificities (Noyes, 2008).

A bacterial one-hybrid (B1H) system was used that allowed the specificities of a DNA-binding domain. Using this system, the DNA-binding specificities was characterized for all 84 homeodomains in Drosophila that are not associated with an additional DNA-binding domain as well as 16 mutant homeodomains with changes in residues that contribute to DNA recognition. This analysis reveals a diverse array of DNA-binding specificities with a minimum of seventeen unique specificities in Drosophila, of which the majority of homeodomains can be clustered into 11 specificity groups (see Clustering of the 84 Drosophila homeodomains). Members of a given specificity group typically share common recognition residues. Combining this data with previous structural and biochemical work on the homeodomain family, a detailed set of recognition determinants is proposed and evaluated for homeodomains and this information was used to broadly and accurately predict the specificities of homeodomains in the human genome (see Comparison of the predicted and determined recognition motifs for 6 human homeodomains; Noyes, 2008).

Remarkable diversity exists in the B1H-determined DNA-binding specificities for the entire set of homeodomains. The conservation of Asn51, which specifies Ade at binding site position 3, in combination with the ability to infer the orientation of each homeodomain on its binding site provides a basis for aligning all of these recognition sequences. Using this master alignment, hierarchical clustering of the Drosophila homeodomains was performed based on the similarity of their DNA-binding specificities. The majority of these factors can be organized into eleven different specificity groups and the average specificity of these groups was determined for the purposes of comparison. In this analysis, only the core 6 base pair element recognized by these factors was used. Consistent with the idea that many homeodomain proteins prefer similar TAAT-related motifs, slightly more than half (43) of the homeodomains fall into the Antp or En specificity groups. There are also a number of specificity groups, such as the Abd-B and NK-1 group, which differ in sequence preference from the Antp or En groups at only one or two positions. However, other groups, such as the TGIF-Exd group, differ at four positions relative to the Antp or En groups. Outside of these specificity groupings are six factors that exhibit unique specificities. The observed diversity of specificities reveals the adaptability of the homeodomain architecture for the recognition of a variety of DNA sequences (Noyes, 2008).

The contribution of specific residues toward binding site preference for one or more group members has been demonstrated in previous studies. This study used correlations between the average group recognition motifs and the amino acid distributions at key DNA recognition positions to systematically describe the characteristics of each group that lead to differences in binding specificity (Noyes, 2008).

  • Antp and En groups: The largest groups of homeodomains provide a reference point to describe how differences in amino acid sequence correlate with DNA-binding specificity. The Antp and En groups share similar recognition motifs and amino acid distributions at the key recognition positions. However, at binding site position 5, the En group prefers Thy, whereas the Antp group tolerates either Gua or Thy. There is a corresponding difference at amino acid position 54: Ala for the En group and Met for the Antp group. In the Antp-DNA structure, the side chain of Met54 is neighboring this base pair (Noyes, 2008).
  • Bcd group: Typical homeodomains utilize Lys50 to specify Cyt at binding site positions 5 and 6 through the interaction of Lys50 with the complementary Gua at these positions. This results in a consensus sequence of TAATCC (Noyes, 2008).
  • NK-1, Bar and Ladybird groups: Many of these homeodomains are members of the NK or DL homeodomain classes and generally have Thr at position 47 or 54. Compared to the Antp and En groups, the homeodomains with Thr47 have reduced specificity at binding site positions 4 and/or 5 (Noyes, 2008).
  • NK-2 group: The members of this group prefer Gua at position 4, due to an interaction between Tyr54 and the complementary Cyt. Their specificities vary at binding site position 1, which correlates with differences at residues 6 and 7 of the N-terminal arm (Noyes, 2008).
  • Abd-B group: These factors prefer Thy over Ade at position 2. In Abd-B, this preference has been mapped to amino acid positions 3, 6 and 7 of the N-terminal arm; however, the variability within the N-terminal arm precludes a simple correlation of binding preference and amino acid sequence (Noyes, 2008).
  • Atypical homeodomains: The atypical groups generally prefer Gua at binding site position 2, and Cyt and Ade at positions 4 and 5. In CG11617, the Iroquois group and the TGIF group, the preference for Cyt and Ade at positions 4 and 5 correlates with the presence of Arg54, consistent with the structure of MATα2. The single exception to this correlation, Onecut, contains a unique residue (Met50), which may contribute to its distinct binding preference. Likewise, with the exception of the Iroquois group, homeodomains that contain Arg55 prefer Gua at position 2, consistent with the Exd and Pbx structures (Noyes, 2008).
  • TGIF-Exd group: The data are consistent with previously described specificities for individual members of the TGIF - Exd group (TGA(C/t)A).
  • Six group: All members of this group (So, Six4 and Optix) display a specificity that overlaps with the recognition motif TGATAC and share identical residues at the key DNA-recognition positions. The data are consistent with a known So motif [(T/C)GATAC]. A discrepancy between these data and a motif (TAAT) reported for an Optix homolog, Six3, was investigated in the analysis of human homeodomains (Noyes, 2008).
  • Iroquois group; The monomeric motif (ACA) reflects part of the palindromic, homodimer binding site (ACANNTGT) for a full-length Mirr protein. Homeodomains in this group have weak preferences at binding site positions 1 and 2, despite containing notable specificity determinants (Arg5 and Arg55). One striking feature of the Iroquois group is Ala at position 8. In other homeodomains, a large hydrophobic residue at this position binds in a cleft formed by the homeodomain helices and appears to position the N-terminal arm over the 5' end of the binding site. To examine the effect of residue 8 on Iroquois specificity, an Ala8Phe mutation was introduced into Caup. This mutation restores, albeit incompletely, the anticipated specificity at positions 1 and 2. The incomplete transformation suggests that additional determinants also contribute to specificity at the 5' end of the binding site (Noyes, 2008).

    This assessment of the typical and atypical superclasses suggests two overlapping, but distinct sets of protein-DNA interactions. Both classes generally share Arg5 and Asn51, which typically specify Thy and Ade at binding site positions 1 and 3, as well as common set of phosphate contacting residues, which should result in a similar docking arrangement of all of these homeodomains with the DNA. Thus, specificity differences between these homeodomains primarily arise from distinct combinations of residues that directly interact with DNA or that influence these contact residues, rather than changes in the overall conformation of the homeodomain-DNA complex (Noyes, 2008).

    This study provides a complete analysis of homeodomain specificities in a metazoan and it dramatically increases the number of characterized homeodomains in this Drosophila, as only 18 of 84 had any binding site information in the FlyREG database. This study has found that the homeodomain family displays an extensive range of specificities in which a wide variety of bases can be preferred at most positions within the core 6 bp binding site. Overall, the majority of homeodomains (93%) in this dataset can be clustered into 11 different specificity groups with an additional 6 homeodomains that display unique specificities. This clustering strategy allowed description of how common variations in residues at a given position in the homeodomain contribute to differences in specificity. However, even within these groups there are homeodomains that display differences in binding site preference. For example, members of the NK-2 group differ in their base preference at the 5'-most position and Exd specificity clearly differs from other members of the TGIF group. In addition, differences outside the core 6 base pair binding site motifs lead to further diversity among homeodomain specificities. Thus, the 17 specificities described by the 11 groups and 6 unique homeodomains represent the minimum number of different specificities recognized by Drosophila homeodomains (Noyes, 2008).

    This analysis demonstrates that the overall sequence similarity between two homeodomains is a useful, but sometimes misleading indicator of the degree of similarity in their DNA-binding specificities. Once factors are clustered into specificity groups, it is possible to compare binding specificity with their degree of sequence homology. As expected, a substantial correlation between sequence similarity and preferred recognition motif is observed. However, multiple examples were found where pairs of closely related homeodomains cluster into different specificity groups. In both naturally-occurring and engineered homeodomains, single amino acid changes at putative DNA recognition positions are sufficient to alter specificity. These observations illustrate the importance of defining the amino acid positions that contribute to variations in binding site specificity in order to make accurate specificity predictions (Noyes, 2008).

    References

    Berger, M. F., et al. (2008). Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133(7): 1266-76. PubMed ID: 18585359

    Noyes, M. B., et al. (2008). Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133(7): 1277-89. PubMed ID: 18585360




  • top of page


    Drosophila genes listed by biochemical function

    Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.

    The Interactive Fly resides on the
    Society for Developmental Biology's Web server.