Drosophila genes listed by biochemical function |
Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences
Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. This study determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. A computational system was developed that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and full 8-mer binding profiles were inferred for the majority of known animal homeodomains. The results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success (Berger, 2008).
It was asked whether the homeodomain monomer binding preferences identified in vitro reflect sequences preferred in vivo. Anecdotally, the highest predicted binding sequences do correspond to known in vivo binding sites. For example, in the predicted 8-mer profile for sea urchin Otx, a previously identified in vivo binding sequence (TAATCC, from the Spec2a RSR enhancer), is contained in the top predicted 8-mer sequence, and, more strikingly, it is embedded in the fifth-highest predicted 8-mer sequence (TTAATCCT). At greater evolutionary distance, three of the four Drosophila Tinman binding sites in the minimal Hand cardiac and hematopoietic (HCH) enhancer are contained within the second (TCAAGTGG), fifth (ACCACTTA), and ninth (GCACTTAA) ranked 8-mers (the fourth overlaps the 428th ranked 8-mer [CAATTGAG], but also overlaps with a GATA binding site and may have constraints on its sequence in addition to binding Tinman) (Berger, 2008).
To ask more generally whether occupied sites in vivo contain sequences preferred in vitro, six ChIP-chip or ChIP-seq data sets in the literature were examined that involved immunoprecipitation of homeodomain proteins that were analyzed, or homologs of proteins analyzed that shared at least 14 of the 15 DNA-contacting amino acids. In all cases, enrichment was observed for monomer binding sites in the neighborhood of the bound fragments, with a peak at the center. Two examples, Drosophila Caudal and human Tcf1/Hnf1 are shown. For Caudal, the size of this ratio peak increased dramatically with E score cutoff, indicating that the most preferred in vitro monomer binding sequences correspond to the most enriched in vivo binding sites (51% of bound fragments have such an 8-mer, versus 17% in randomly selected fragments). For Tcf1/Hnf1, however, the majority of sequences bound in vivo do not contain the best in vitro binding sequences, although most do contain at least one 8-mer with E > 0.45 (53%, versus 27% in random fragments), suggesting utilization of weaker binding sites. Similar results were obtained with PWMs. Thus, the requirement for highest-affinity binding sequences may vary among homeodomain proteins, species, or under different physiological contexts. Nonetheless, a large proportion of the in vivo binding events apparently involve the monomeric homeodomain sequence preferences, which can be derived in vitro (Berger, 2008).
The comprehensive characterization of homeodomain DNA-binding specificities is described for Drosophila melanogaster. The analysis of all 84 independent homeodomains from Drosophila reveals the breadth of DNA sequences that can be specified by the homeodomain. The majority of these factors can be organized into 11 different specificity groups, where the preferred recognition sequence between these groups can differ at up to four of the six core recognition positions. Analysis of the recognition motifs within these groups led to a catalog of common specificity determinants that may cooperate or compete to define the binding site preference. With these recognition principles, a homeodomain can be reengineered to create factors where its specificity is altered at the majority of recognition positions. This resource also allows prediction of homeodomain specificities from other organisms, which is demonstrated by the prediction and analysis of human homeodomain specificities (Noyes, 2008).
A bacterial one-hybrid (B1H) system was used that allowed the specificities of a DNA-binding domain. Using this system, the DNA-binding specificities was characterized for all 84 homeodomains in Drosophila that are not associated with an additional DNA-binding domain as well as 16 mutant homeodomains with changes in residues that contribute to DNA recognition. This analysis reveals a diverse array of DNA-binding specificities with a minimum of seventeen unique specificities in Drosophila, of which the majority of homeodomains can be clustered into 11 specificity groups (see Clustering of the 84 Drosophila homeodomains). Members of a given specificity group typically share common recognition residues. Combining this data with previous structural and biochemical work on the homeodomain family, a detailed set of recognition determinants is proposed and evaluated for homeodomains and this information was used to broadly and accurately predict the specificities of homeodomains in the human genome (see Comparison of the predicted and determined recognition motifs for 6 human homeodomains; Noyes, 2008).
Remarkable diversity exists in the B1H-determined DNA-binding specificities for the entire set of homeodomains. The conservation of Asn51, which specifies Ade at binding site position 3, in combination with the ability to infer the orientation of each homeodomain on its binding site provides a basis for aligning all of these recognition sequences. Using this master alignment, hierarchical clustering of the Drosophila homeodomains was performed based on the similarity of their DNA-binding specificities. The majority of these factors can be organized into eleven different specificity groups and the average specificity of these groups was determined for the purposes of comparison. In this analysis, only the core 6 base pair element recognized by these factors was used. Consistent with the idea that many homeodomain proteins prefer similar TAAT-related motifs, slightly more than half (43) of the homeodomains fall into the Antp or En specificity groups. There are also a number of specificity groups, such as the Abd-B and NK-1 group, which differ in sequence preference from the Antp or En groups at only one or two positions. However, other groups, such as the TGIF-Exd group, differ at four positions relative to the Antp or En groups. Outside of these specificity groupings are six factors that exhibit unique specificities. The observed diversity of specificities reveals the adaptability of the homeodomain architecture for the recognition of a variety of DNA sequences (Noyes, 2008).
The contribution of specific residues toward binding site preference for one or more group members has been demonstrated in previous studies. This study used correlations between the average group recognition motifs and the amino acid distributions at key DNA recognition positions to systematically describe the characteristics of each group that lead to differences in binding specificity (Noyes, 2008).
This assessment of the typical and atypical superclasses suggests two overlapping, but distinct sets of protein-DNA interactions. Both classes generally share Arg5 and Asn51, which typically specify Thy and Ade at binding site positions 1 and 3, as well as common set of phosphate contacting residues, which should result in a similar docking arrangement of all of these homeodomains with the DNA. Thus, specificity differences between these homeodomains primarily arise from distinct combinations of residues that directly interact with DNA or that influence these contact residues, rather than changes in the overall conformation of the homeodomain-DNA complex (Noyes, 2008).
This study provides a complete analysis of homeodomain specificities in a metazoan and it dramatically increases the number of characterized homeodomains in this Drosophila, as only 18 of 84 had any binding site information in the FlyREG database. This study has found that the homeodomain family displays an extensive range of specificities in which a wide variety of bases can be preferred at most positions within the core 6 bp binding site. Overall, the majority of homeodomains (93%) in this dataset can be clustered into 11 different specificity groups with an additional 6 homeodomains that display unique specificities. This clustering strategy allowed description of how common variations in residues at a given position in the homeodomain contribute to differences in specificity. However, even within these groups there are homeodomains that display differences in binding site preference. For example, members of the NK-2 group differ in their base preference at the 5'-most position and Exd specificity clearly differs from other members of the TGIF group. In addition, differences outside the core 6 base pair binding site motifs lead to further diversity among homeodomain specificities. Thus, the 17 specificities described by the 11 groups and 6 unique homeodomains represent the minimum number of different specificities recognized by Drosophila homeodomains (Noyes, 2008).
This analysis demonstrates that the overall sequence similarity between two homeodomains is a useful, but sometimes misleading indicator of the degree of similarity in their DNA-binding specificities. Once factors are clustered into specificity groups, it is possible to compare binding specificity with their degree of sequence homology. As expected, a substantial correlation between sequence similarity and preferred recognition motif is observed. However, multiple examples were found where pairs of closely related homeodomains cluster into different specificity groups. In both naturally-occurring and engineered homeodomains, single amino acid changes at putative DNA recognition positions are sufficient to alter specificity. These observations illustrate the importance of defining the amino acid positions that contribute to variations in binding site specificity in order to make accurate specificity predictions (Noyes, 2008).
Berger, M. F., et al. (2008). Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133(7): 1266-76. PubMed ID: 18585359
Noyes, M. B., et al. (2008). Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133(7): 1277-89. PubMed ID: 18585360
Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.
The Interactive Fly resides on the
top of page
Drosophila genes listed by biochemical function
Society for Developmental Biology's Web server.