Joshua Wetzel will present his FPO, "Structure-aware Approaches for Deciphering Sequence-specific Protein-DNA Interactions" on Tuesday, 1/29/2019 at 1PM in CS 402.  

The members of his committee are: Examiners: Mona Singh (adviser), Shvartsman (CBE), and Barbara Engelhardt; Readers: Ben Raphael and Olga Troyanskaya 

Everyone is invited to attend his talk. The talk abstract follows below:

Abstract
Interactions between proteins and specific genomic loci are critical to the proper functioning
of all cells. The ability of a DNA-binding protein (DBP) to distinguish between its
target binding sites and other genomic regions is required for a myriad of crucial functions,
including transcriptional regulation, meiotic recombination, chromatin remodeling,
and genome organization. However, the fundamental relationship between the amino acid
sequence of a DBP and its DNA-binding preferences remains largely elusive.
High-throughput experimental technologies for detecting protein-DNA interactions
have advanced substantially in the past decade and have enabled measurements for thousands
of natural and synthetic DBP variants. However, these technologies typically require
sophisticated analyses to uncover intrinsic DNA-binding specificities from the measured
signals, often have poorly understood noise and sampling profiles, and provide little insight
into the underlying mechanism of interaction. Meanwhile, precise co-complex structural
data provide great insight into the mechanistic principles guiding interactions between
DBPs and their DNA ligands, albeit at substantially lower throughput. These two types of
data are complementary but rarely considered in concert.
In this dissertation, I describe novel computational approaches that improve the accuracy
and interpretability of inferences derived from high-throughput protein-DNA interaction
data, via direct consideration of the underlying protein-DNA structural interaction
interface shared across proteins within the same DNA-binding family. First, I describe
a systematic exploration of the DNA-binding landscape for Cys2His2 zinc finger (C2H2-
ZF) proteins, the most abundant DNA-binding family in eukaryotes. Here we inferred the
largest set of C2H2-ZF specificities to date and developed a state-of-the-art structurallyinspired
method for predicting specificities for novel C2H2-ZFs. Second, I demonstrate
how to leverage the large amounts of specificity data available for DBDs to develop a general
framework that improves accuracy of high-throughput DNA-binding specificity inferences
by jointly considering interaction preferences for groups of proteins from the same
DNA-binding family, rewarding global consistency according to an expected similarity
measure reflecting family-level structural considerations. Finally, I provide a probabilistic
framework for improving interpretability of high-throughput data by mapping inferred
specificities of DBPs from the same DNA-binding family onto a common “reference” structural
interface model derived from aggregated family-level co-complex data.