Tim O'Connor will present his research seminar/general exam on Thursday May 10 at 2PM in Room 302 (note room!). The members of his committee are: Manuel Llinas (Molecular Biology, advisor), Olga Troyanskaya, and David Blei. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. --------------------------------------- Abstract Plasmodium falciparum is the causative agent of the most virulent strain of malaria, a disease that causes over one million deaths per year. Recently, this organism's genome was sequenced, but only a minority of the genes have known function, with an even smaller proportion having any detailed functional description. While manual examination of individual genes continually expands this functional knowledge base, it is a both slow and expensive way to tackle the roughly 3000 uncharacterized genes. By identifying functionally described genes related to an unknown query gene, the biological processes in which that query gene is involved can be elucidated. Several computational methods that integrate multiple, genome-wide data sets have been developed to identify functional relationships between pairs of genes. The problem of identifying functional relationships turns into a supervised machine learning problem with two class labels. Several methods can address the problem, including logistic regression, kernel methods and support vector machines, neural networks, and Bayesian networks. Our method of data integration is through the use of statistical inference with Bayesian networks, permitting multiple datasets of different types to be integrated in a way that can incorporate prior knowledge. To train the network, a benchmark gold standard of known or high confidence interactions was developed for P. falciparum. The class labels in the gold standard were obtained first from the Gene Ontology (GO) consortium using GRIFn's expert-derived approach. Secondly, labels were derived from the Malaria Parasite Metabolic Pathways (MPMP) database, which is a manually curated version of the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways database specifically for P falciparum. Forty-th ree diverse datasets including features based on DNA sequence, comparative genomics, RNA expression profiles, and growth perturbation experiments were used as input with the predictor's parameters trained using the expectation-maximization algorithm. Functional relationship prediction accuracy was estimated using cross validation with an ROC curve of comparable to accuracy to a similar approach for yeast, which is a much more well characterized organism. The resulting predictions have been analyzed to assess the scope of the uncharacterized genome fraction which can be shown to be related to known biological processes. These predictions have also been compared to those done in yeast showing only two highly similar processes of protein synthesis and protein degradation. The predictions will be made accessible through a graphical web interface. Prior work Date, S.V. & Stoeckert, C.J., Jr. Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res. 2006 Apr;16(4):542-9. Epub 2006 Mar 6. Wuchty S, Ipsaro JJ. A Draft of Protein Interactions in the Malaria Parasite P. falciparum. J Proteome Res. 2007 Feb 15 Data Integration Troyanskaya OG, Dolinski K, Owen AB, Altman RB, and Botstein D. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in S. cerevisiae). Proc Natl Acad Sci USA 100(14): 8348-53, 2003. Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG. Discovery of biological networks from diverse functional genomic data. Genome Biology6(13):R114, 2005. Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey. Chapter 20:Statistical Learning Methods Hastie T, Tibshirani R, Friedman J (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Chapter 8:Model Inference and Averaging. Gold Standard & Data Processing Myers CL, Barrett D, Hibbs MA, Huttenhower C, Troyanskaya OG. Finding function: evaluation methods for functional genomic data. BMC Genomics 2006, 7:187. Ginsburg H. Progress in in silico functional genomics: the malaria Metabolic Pathways database. Trends Parasitol. 2006 Jun;22(6):238-40. Storey JD, Xiao W, Leek JT, Tompkins RG, and Davis RW. (2005) Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102: 12837-12842. Plasmodium Gardner MJ, Hall N, Fung E, White O, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002 Oct 3;419(6906):498-511. Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003 Oct;1(1):E5.