Joshua Wetzel will present his Pre FPO "Structure-aware approaches for deciphering sequence-specific protein-DNA interactions" on August 2, 2018 at 2:30pm in CS 302.
Joshua Wetzel will present his Pre FPO " Structure-aware approaches for deciphering sequence-specific protein-DNA interactions" on August 2, 2018 at 2:30pm in CS 302. The members of his committee are Mona Singh (adviser), Barbara Engelhardt, Olga Troyanskaya, Ben Raphael, and Stanislav Shvartsman (CBE). All are welcome to attend. Title: Structure-aware approaches for deciphering sequence-specific protein-DNA interactions Abstract: Interactions between proteins and specific genomic loci are critical to the proper functioning of all cells. The ability of DNA-binding proteins to distinguish between their target binding sites and other regions of the genome has been demonstrated to be responsible for directing processes as fundamental as development, meiotic recombination, chromatin regulation, and even organization of the genome into a condensed three-dimensional nuclear structure. Highly precise X-ray or NMR co-complex structural studies have provided great insight into biochemical and stereo-chemical principles guiding such interactions; however, such studies are inherently low-throughput and provide us with the binding behavior of a protein with respect to only a single DNA target. Meanwhile, high-throughput experimental methodologies continue to scale, allowing us to examine a protein’s relative binding preferences across enormous libraries of DNA sequences, albeit at lower-resolution and with decreased interpretability of the interactions. Indeed, while we continue to characterize the individual DNA-binding preferences of large quantities of naturally-occurring or synthetic proteins, the fundamental relationship between the amino acid sequence of the protein and its DNA-binding preferences remains largely elusive. In this talk, I will describe three projects that further our understanding of the relationship between amino acid sequence and DNA-binding preferences (specificities): First, I will discuss a combined experimental-computational effort to systematically determine the DNA-binding landscape across an extremely large library of randomized protein variants of the Cys2His2zinc finger (C2H2-ZF) family, the most abundant group of DNA-binding proteins in eukaryotes. From an integrative analysis of the resulting data, I inferred the largest set of specificities to date for C2H2-ZFs, developed a state-of-the-art structurally-inspired method for predicting specificities for C2H2-ZFs based on amino acid sequence alone, and confirmed or discovered various fundamental and complex characteristics of DNA recognition for this protein family. Second, while specificities of proteins within a single family are related to one another via a common underlying structural DNA-binding interface; this fact is routinely ignored when inferring specificities. Accounting for such relationships should allow considerably more robust estimation in the presence of noise or under-sampling: common properties of high throughput data. To address this, I have developed a platform-independent method for joint inference of specificities for related proteins via enforcement of global within-dataset consistency, according to a structural interface model. Among other results, I will demonstrate that enforcing global consistency within a dataset increases agreement of specificities with an external gold standard beyond the level of agreement achieved via either individual inference of specificities or simple enforcement of local consistency. Finally, individual protein-DNA interaction screens do not provide any direct information that maps amino acid residue positions within the protein to particular DNA base positions in the derived specificities. Such mappings are valuable for learning protein-DNA recognition rules from high-throughput data and would provide insight into which binding site positions may be affected by point mutations in the protein. To help close this gap, I have created an unsupervised statistical framework for mapping groups of proteins from a given family, along with their corresponding specificities, onto structural interface models derived from aggregated family-level co-complex data. The framework outputs a multiple-motif alignment across proteins with potentially diverse DNA-binding specificities, along with a specificity prediction model consistent with the alignment. I will demonstrate proof of principle for this approach by showing that the multiple motif alignment produced for a group of mouse homeodomain proteins with diverse binding preferences is consistent with a known (experimentally-determined) alignment for related homeodomain proteins in fruit fly.
participants (1)
-
Nicki Gotsis