Abstract:
Protein-DNA interaction is a fundamental step in cellular regulation. Proteins known as transcription factors (TFs) are produced in order to either enhance or repress the expression of various gene products according to the particular needs of the cell at a given moment. They often do so by binding to the genome in specific locations according to the amino acid residues contained in their DNA-binding domain (DBD), which is the structural unit of the protein that is often used to classify TFs into groups. The largest class of TFs, comprising roughly half of the annotated TFs in eukaryotes, is defined by a DBD known as a Cys2His2 zinc finger (C2H2-ZF). While the majority of DBDs can bind only a limited repertoire DNA sequences, members of the C2H2-ZF class span an extremely wide variety of DNA binding preferences. Moreover, these preferences appear to be determined largely by the identity of amino acids occupying only a few key positions of a structurally well-defined and semi-modular protein-DNA contact interface. However, the rules that govern DNA-binding preferences remain unclear, thwarting our ability to predict binding sites of endogenous C2H2-ZFs. Due to the large number of possible amino-acid combinations and the variety of DNA sequences they can potentially bind, high-throughput interaction screens and rigorous computational methods are necessary.
In my talk, I’ll discuss the analysis of a high-throughput, randomized, synthetic screen of C2H2-ZF-DNA functional interactions obtained via the bacterial one-hybrid selection system. After extensive filtering, the data reveals hundreds to thousands of ‘canonical’ helices binding with varying levels of affinity to each of the 64 possible 3bp DNA targets, providing the most comprehensive view of C2H2-ZF-DNA interaction to date. I will explain an information theoretic analysis that validates an expanded view of the physical model for the C2H2-ZF-DNA binding interface and show methods for inferring DNA-binding profiles via integration of data from independent protein selections. Additionally, I extend the predictive scope of the data by adapting a classical nearest neighbors approach by leveraging information related to the physical binding model and frequently observed amino acid substitutions. I will demonstrate strong concordance between predicted binding profiles and their experimentally determined counterparts within the synthetic context, as well as generalizability of the data via prediction of binding profiles for naturally occurring C2H2-ZFs. Overall, the analysis reveals a complex binding landscape for C2H2-ZFs, which shows both agreement and conflict with previously proposed ‘codes’ of DNA-binding specificity.
Reading List:
1. Jones, N. C. & Pevzner, P. A. An Introduction to Bioinformatics Algorithms. MIT Press (2004). (Textbook)
2. Stormo, G. D. & Zhao, Y. Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11, 751–60 (2010).
3. Persikov, A. V. & Singh, M. An expanded binding model for Cys2His2 zinc-finger protein-DNA interfaces. Phys. Biol. 8(3):035010 (2011).
4. Noyes, M. et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 36(8), 2547-60 (2008).
5. Klug, A. The discovery of zinc fingers and their applications in gene regulation and genome manipulation. Annu. Rev. Biochem. 79, 213–31 (2010).
6. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–39 (2013).
7. Benos, P. V., Lapedes, A. S. & Stormo, G. D. Probabilistic Code for DNA Recognition by Proteins of the EGR Family. J. Mol. Biol. 323, 701–727 (2002).
8. Enuameh, M. S. et al. Global analysis of Drosophila Cys₂-His₂ zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants. Genome Res. 23, 928–40 (2013).
9. Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. a & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–63 (2009).
10. Persikov, A. V & Singh, M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 1–12 (2013).