Exploring the interplay between topology and function in protein interaction
networks
The emergence in recent years of numerous high-througphut
experimental techniques in biology has lead to a new, genome-scale approach
towards biological research. This high-throughput biology faces two
complementary tasks: obtaining data on genomic scale and making sense of this
data. It is the second task where computer scientists working in
computational biology can make great contribution.
One type of data
obtained by high-throughput experiments is information about interactions among
proteins, such as physical protein-protein interactions. This
information can bring scientists closer to a solution to one of the
most important problems in biology: understanding the role that different
proteins play in the cell and the interplay among them.
In my work, I
look at the relationship between protein function and the protein's context in
the interaction network from two angles: using interaction networks and
information about other proteins to predict a protein's cellular role, and
finding schemas, or recurring patterns of interaction among different types of
proteins.
In the first part of the talk, I explore the use of physical
protein interaction networks for predicting the function of proteins.
First, using as illustration some of the existing approaches to this problem, I
discuss which topological properties of interaction networks should be taken
into account by algorithms for predicting protein function based on physical
interaction networks. Using these desiderata as guidelines, I introduce an
original network-flow based algorithm called FunctionalFlow that exploits the
underlying structure of protein interaction maps in order to predict protein
function. In cross-validation testing on the yeast proteome, I show that
FunctionalFlow has improved performance over previous methods in predicting the
function of proteins with few (or no) annotated protein neighbors. I demonstrate
that FunctionalFlow performs well because it takes advantage of both network
topology and some measure of locality. Finally, I show that performance can be
improved substantially as we consider multiple data sources and use them to
create weighted interaction networks.
In the second part of the talk, I
take a different view at the topology-function relationship and use known
information about protein molecular function and the physical interaction
network to attempt to uncover organizational principles of the network. In
this bottom-up view, I examine the networks from the perspective of ``pathway
schemas,'' or recurring patterns of interaction among different types of
proteins. Proteins in these schemas tend to act as functional units
within diverse biological processes. I discuss computational methods for
automatically uncovering statistically over-represented pathway schemas in
protein-protein interaction maps, and touch upon the comparative-interactomics
aspects of this problem. Coming back to the task of improving our
understanding of protein function, I conclude by demonstrating how
overrepresented schemas can be used to gain new insights about the biological
function of proteins.