[talks] T O'Connor general exam

Melissa M Lawson mml at CS.Princeton.EDU
Thu May 3 10:22:21 EDT 2007

Tim O'Connor will present his research seminar/general exam on Thursday May 10 
at 2PM in Room 302 (note room!).  The members of his committee are:  Manuel 
Llinas (Molecular Biology, advisor), Olga Troyanskaya, and David Blei.  Everyone 
is invited to attend his talk, and those faculty wishing to remain for the oral exam 
following are welcome to do so.  His abstract and reading list follow below.


	Plasmodium falciparum is the causative agent of the most virulent strain of
malaria, a disease that causes over one million deaths per year. Recently, this organism's
genome was sequenced, but only a minority of the genes have known function, with an even
smaller proportion having any detailed functional description. While manual examination of
individual genes continually expands this functional knowledge base, it is a both slow and
expensive way to tackle the roughly 3000 uncharacterized genes. By identifying
functionally described genes related to an unknown query gene, the biological processes in
which that query gene is involved can be elucidated. Several computational methods that
integrate multiple, genome-wide data sets have been developed to identify functional
relationships between pairs of genes.

	The problem of identifying functional relationships turns into a supervised
machine learning problem with two class labels. Several methods can address the problem,
including logistic regression, kernel methods and support vector machines, neural
networks, and Bayesian networks. Our method of data integration is through the use of
statistical inference with Bayesian networks, permitting multiple datasets of different
types to be integrated in a way that can incorporate prior knowledge. To train the
network, a benchmark gold standard of known or high confidence interactions was developed
for P. falciparum. The class labels in the gold standard were obtained first from the Gene
Ontology (GO) consortium using GRIFn's expert-derived approach. Secondly, labels were
derived from the Malaria Parasite Metabolic Pathways (MPMP) database, which is a manually
curated version of the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways database
specifically for P falciparum. Forty-th ree diverse datasets including features based on
DNA sequence, comparative genomics, RNA expression profiles, and growth perturbation
experiments were used as input with the predictor's parameters trained using the
expectation-maximization algorithm. 

	Functional relationship prediction accuracy was estimated using cross validation
with an ROC curve of comparable to accuracy to a similar approach for yeast, which is a
much more well characterized organism. The resulting predictions have been analyzed to
assess the scope of the uncharacterized genome fraction which can be shown to be related
to known biological processes. These predictions have also been compared to those done in
yeast showing only two highly similar processes of protein synthesis and protein
degradation. The predictions will be made accessible through a graphical web interface.

Prior work

Date, S.V. & Stoeckert, C.J., Jr. Computational modeling of the Plasmodium falciparum
interactome reveals protein function on a genome-wide scale. Genome Res. 2006
Apr;16(4):542-9. Epub 2006 Mar 6. 

Wuchty S, Ipsaro JJ. A Draft of Protein Interactions in the Malaria Parasite P.
falciparum. J Proteome Res. 2007 Feb 15

Data Integration

Troyanskaya OG, Dolinski K, Owen AB, Altman RB, and Botstein D. A Bayesian framework for
combining heterogeneous data sources for gene function prediction (in S. cerevisiae). Proc
Natl Acad Sci USA 100(14): 8348-53, 2003. 

Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya
OG. Discovery of biological networks from diverse functional genomic data. Genome
Biology6(13):R114, 2005. 

Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall
Series in Artificial Intelligence. Englewood Cliffs, New Jersey. Chapter 20:Statistical
Learning Methods 

Hastie T, Tibshirani R, Friedman J (2001). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer Series in Statistics. Chapter 8:Model
Inference and Averaging.

Gold Standard & Data Processing

Myers CL, Barrett D, Hibbs MA, Huttenhower C, Troyanskaya OG. Finding function: evaluation
methods for functional genomic data. BMC Genomics 2006, 7:187. 

Ginsburg H. Progress in in silico functional genomics: the malaria Metabolic Pathways
database. Trends Parasitol. 2006 Jun;22(6):238-40.

Storey JD, Xiao W, Leek JT, Tompkins RG, and Davis RW. (2005) Significance analysis of
time course microarray experiments. Proceedings of the National Academy of Sciences, 102:


Gardner MJ, Hall N, Fung E, White O, et al. Genome sequence of the human malaria parasite
Plasmodium falciparum. Nature. 2002 Oct 3;419(6906):498-511.

Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL. The transcriptome of the
intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003

More information about the talks mailing list