[talks] FW: generals stuff

Melissa Lawson mml at CS.Princeton.EDU
Wed Apr 22 12:00:37 EDT 2009

Peng Jiang will present his research seminar/general exam on Tuesday April 28 at 10AM in 
Room 402.  The members of his committee are: Mona Singh (advisor), Olga Troyanskaya, 
and Tom Funkhouser.  Everyone is invited to attend his talk and those faculty wishing to 
remain for the oral exam following are welcome to do so.  His abstract and reading list 
follow below.

Large-scale biological networks are being determined at an increasing pace.
Computational analyses of the resulting data provide new opportunities for
revealing cellular organization and uncovering protein function and
pathways. Clustering is one of the most widely used computational approaches
for analyzing these types of biological data, and can reveal important
modular structures. Previous network clustering algorithms in
bio-informatics community work well on midsized networks such as, for
example, the yeast protein interaction network. However, by recent
computational techniques for data integration,  we are starting to get large
biological networks for numerous organisms. In these networks, there might
be tens thousands of nodes and millions or even hundreds millions of
interactions in a single graph. In such cases, previous biological
clustering algorithms will be impractically slow or entirely unapplicable.

In our work, we present a graph clustering algorithm with time complexity
O(Vlog(V)+E) and space complexity O(E), where V and E are the number of
vertices edges in the graph.  By testing on several large genomic networks,
our algorithm successfully clustered all of them, whereas most previous
approaches failed.  In the cases where previous approaches were able to
cluster these networks, our algorithm was hundreds of times faster. We
validated our approach with respect to its ability to uncover biologically
relevant clusters and show we get state-of-the-art performance. Our fast
algorithm enables us to perform new types of analysis. For example, we
clustered a recent human  context specific network data set over 230
different conditions, resulting in 230 networks which are nearly 300GB in
total.  We were able to uncover clusters specific to individual networks,
putatively corresponding to context-specific modules, a type of analysis
which was not possible prior to our work.

[1] Thomas HC, Charles EL, Ronald LR, Clifford S. (2001)  Introduction to
algorithm, 2nd edition.  MIT press

[1] Broh´ee S., van Helden J. (2006) Evaluation of clustering algorithms for
protein-protein interaction networks, BMC Bioinformatics, 7, 488.

[2] Enright A.J., Van Dongen S., Ouzounis C.A. (2002) An efficient algorithm
for largescale detection of protein families, Nucleic Acids Research, 30(7),

[3] Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. (2006)
Development and implementation of an algorithm for detection of protein
complexes in large interaction networks, BMC Bioinformatics, 7, 207.

[4] Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T,
Julien P, Roth A, Simonovic M, Bork P, von Mering C. (2009) STRING 8–a
global view on proteins and their functional interactions in 630 organisms,
Nucleic Acids Res, 37, D412-416.

[5] Bobby-Joe Breitkreutz, Chris Stark, Teresa Reguly, Lorrie Boucher,
Ashton Breitkreutz, Michael Livstone, Rose Oughtred, Daniel Lackner, Jurg
Bhler, Valerie Wood, Kara Dolinski and Mike Tyers. (2008) The BioGRID
Interaction Database: 2008 Update, Nucleic Acids Res, 36, D637-640.

[6] Curtis Huttenhower, Erin M. Haley3, Matthew A. Hibbs, Vanessa Dumeaux,
Daniel R. Barrett, Hilary A. Coller, Olga G. Troyanskaya. (2009) Exploring
the human genome with functional maps, Genome Research, 4(9), e1000165.

[7] Fredman ML, Tarjan RE. (1987) Fibonacci heaps and their uses in improved
network optimization algorithms, Journal of the ACM, 34(3), 596-615.

[8] Rhee SY,Wood V, Dolinski K, Draghici S. (2008) Use and misuse of the
gene ontology annotations, Nat Rev Genet, 9(7), 509-515.

[9] Lord PW, Stevens RD, Brass A, Goble CA. (2003) Investigating semantic
similarity measures across the Gene Ontology: the relationship between
sequence and annotation, Bioinformatics, 19(10), 1275-1283.

More information about the talks mailing list