
Peng Jiang will present his research seminar/general exam on Tuesday April 28 at 10AM in Room 402. The members of his committee are: Mona Singh (advisor), Olga Troyanskaya, and Tom Funkhouser. Everyone is invited to attend his talk and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. ------------------------------------ Abstract: Large-scale biological networks are being determined at an increasing pace. Computational analyses of the resulting data provide new opportunities for revealing cellular organization and uncovering protein function and pathways. Clustering is one of the most widely used computational approaches for analyzing these types of biological data, and can reveal important modular structures. Previous network clustering algorithms in bio-informatics community work well on midsized networks such as, for example, the yeast protein interaction network. However, by recent computational techniques for data integration, we are starting to get large biological networks for numerous organisms. In these networks, there might be tens thousands of nodes and millions or even hundreds millions of interactions in a single graph. In such cases, previous biological clustering algorithms will be impractically slow or entirely unapplicable. In our work, we present a graph clustering algorithm with time complexity O(Vlog(V)+E) and space complexity O(E), where V and E are the number of vertices edges in the graph. By testing on several large genomic networks, our algorithm successfully clustered all of them, whereas most previous approaches failed. In the cases where previous approaches were able to cluster these networks, our algorithm was hundreds of times faster. We validated our approach with respect to its ability to uncover biologically relevant clusters and show we get state-of-the-art performance. Our fast algorithm enables us to perform new types of analysis. For example, we clustered a recent human context specific network data set over 230 different conditions, resulting in 230 networks which are nearly 300GB in total. We were able to uncover clusters specific to individual networks, putatively corresponding to context-specific modules, a type of analysis which was not possible prior to our work. Readlist: Book: [1] Thomas HC, Charles EL, Ronald LR, Clifford S. (2001) Introduction to algorithm, 2nd edition. MIT press Papers: [1] Broh´ee S., van Helden J. (2006) Evaluation of clustering algorithms for protein-protein interaction networks, BMC Bioinformatics, 7, 488. [2] Enright A.J., Van Dongen S., Ouzounis C.A. (2002) An efficient algorithm for largescale detection of protein families, Nucleic Acids Research, 30(7), 1575-1584. [3] Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. (2006) Development and implementation of an algorithm for detection of protein complexes in large interaction networks, BMC Bioinformatics, 7, 207. [4] Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. (2009) STRING 8a global view on proteins and their functional interactions in 630 organisms, Nucleic Acids Res, 37, D412-416. [5] Bobby-Joe Breitkreutz, Chris Stark, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, Michael Livstone, Rose Oughtred, Daniel Lackner, Jurg Bhler, Valerie Wood, Kara Dolinski and Mike Tyers. (2008) The BioGRID Interaction Database: 2008 Update, Nucleic Acids Res, 36, D637-640. [6] Curtis Huttenhower, Erin M. Haley3, Matthew A. Hibbs, Vanessa Dumeaux, Daniel R. Barrett, Hilary A. Coller, Olga G. Troyanskaya. (2009) Exploring the human genome with functional maps, Genome Research, 4(9), e1000165. [7] Fredman ML, Tarjan RE. (1987) Fibonacci heaps and their uses in improved network optimization algorithms, Journal of the ACM, 34(3), 596-615. [8] Rhee SY,Wood V, Dolinski K, Draghici S. (2008) Use and misuse of the gene ontology annotations, Nat Rev Genet, 9(7), 509-515. [9] Lord PW, Stevens RD, Brass A, Goble CA. (2003) Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation, Bioinformatics, 19(10), 1275-1283.
participants (1)
-
Melissa Lawson