Shilpa Nadimpalli Kobren will present
her FPO on Thursday, May 31, 2018 at 10:00am in CS402. All are
welcomed to attend.
Committee members: Mona Singh
(adviser), Barbara Engelhardt
(examiner), Stanislav Shvartsman (examiner), Olga Troyanskaya
(reader), and Benjamin J. Raphael (reader)
Title: Detecting and Analyzing
Variation in Protein Interactions
Abstract:
Proteins carry out a dazzling multitude of functions by
interacting with DNA, RNA, other proteins and various other
molecules within our cells. Together these interactions comprise
complex networks that differ naturally across cells within an
organism, across individuals in a population,
and across species. Although such variation is critical for
normal organismal functioning, mutations affecting protein
interactions are also known to underlie a wide
range of human diseases. In this dissertation, I introduce novel
computational approaches that explore the extent to which
specific protein interactions vary across species, across
healthy individuals, and across individuals with cancer.
To start, I focus on interaction
variation across species. It is well established that changes in
protein-DNA interactions underlie a wide
range of observable differences across species. These
differences are primarily thought to stem from changes in the
DNA sites that transcription factor (TF) proteins bind to,
although changes in the binding properties of TFs themselves
have also been observed. Determining the prevalence of such TF
changes, however, remains infeasible using current experimental
approaches. Here, I develop and apply a comparative
genomics framework to systematically quantify changes in the
DNA-binding properties of orthologous TFs across species
spanning ~45 million years of evolutionary divergence. I
demonstrate that, contrary to expectation, cross-species
regulatory network divergence resulting from changes in
non-duplicated DNA-binding proteins is pervasive. These findings
reveal a widespread yet largely
unstudied source of divergence across transcriptional regulatory
programs in animals.
Next, I turn my attention to
interaction variation across individuals. In order to
comprehensively quantify this, I first combine large-scale
sequence, domain and structure information to pinpoint sites
within protein domains---the fundamental structural units in
proteins---that are involved in binding DNA, RNA, peptides,
ions, metabolites, or other small molecules. This domain-based
approach enables us to identify putative interaction sites in
over 60% of human genes, representing a 2.4-fold
improvement over comparable state-of-the-art approaches for this
task. I next demonstrate that whereas domain-inferred
interaction sites are significantly depleted of natural variants
across ~60,000 healthy individuals, these same sites are
significantly enriched for cancer mutations across ~11,000 tumor
samples. My analysis demonstrates that the cellular network
variation that occurs across healthy individuals is unlikely to
be due to changes within proteins; in contrast, mutations
acquired in cancers appear to preferentially alter cellular
networks by perturbing the proteins themselves.
Finally, I show how we can leverage an
interaction-based viewpoint to uncover mutated genes that play
causal roles in human cancers. In particular, I aim to uncover
genes whose interaction interfaces are significantly altered in
tumors. Towards this end, I develop a robust
computational framework that integrates my per-domain-position
binding propensities with additional sources of biological data
regarding protein functionality. I demonstrate that by
analytically computing the significance of patterns of
mutations, my approach is able to achieve a dramatic
improvement in runtime over atypical
empirical permutation test for this task. Moreover, my
interaction-based method not only recapitulates known cancer
driver genes faster and with greater precision than previous
methods, but it also uncovers relatively rarely-mutated genes
with likely roles in cancer. Through focusing on the somatic
alteration of protein interaction interfaces in tumors, my
method can inform the perturbed molecular mechanisms across
known and putative cancer genes, thereby enabling valuable
insights that may help guide personalized cancer treatments.