Young-suk Lee will present his research seminar/general exam on Tuesday May 15 at 2PM in Room 401 (note room!). The members of his committee are: Olga Troyanskaya (advisor), Mona Singh, and David Blei. Everyone is invited to attend his talk and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. --------------------------- A microarray experiment measures the abundance of thousands of transcript in a given biological sample in order to quantify its unique transcriptome. Many research groups and institutions have used microarrays, and have been publishing them and their associated sample description on the GEO (Gene Expression Omnibus) database website. The most valuable biological sample can arguably be human tissue samples, and so most human microarray studies have been so far limited to few experiments for certain tissue-types. GEO has made these human microarray data publicly available, but the free-text sample description hinders a large-scale tissue-specific microarray analysis. We present a hierarchical multi-label tissue prediction algorithm that returns a rank of predicted tissue-types for a given microarray data. This algorithm may be used to annotate the many human microarrays in GEO in which their tissue information is hidden or even absent. We propose that all tissue prediction algorithms must return a rank because most, if not all, biological samples consist of multiple tissue types that have a hierarchical order. So even a single accurate prediction may not completely describe the biological sample. We compare the performance of prediction algorithms with and without hierarchical information, and our algorithm that uses Bayesian correction to combine multiple tissue classifiers. In the biological community, this algorithm may be used as a comprehensive background or sanity-check on new human microarray datasets that measures potential contamination and tissue composition of the biological sample. Reading list Introduction to Machine Learning (Adaptive Computation and Machine Learning) by Ethem Alpaydin Ch 1 Introduction Ch 2 Supervised Learning Ch 3 Bayesian Decision Theory Ch 4 Parametric Methods Ch 5 Multivariate Methods Ch 8 Nonparametric Methods Ch 10 Linear Discrimination Ch 14 Assessing and Comparing Classification Algorithms Ch 15 Combining Multiple Learners Papers: Matthew N. McCall, Karan Uppal, Harris A. Jaffee, Michael J. Zilliox, and Rafael A. Irizarry The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes Nucl. Acids Res. (2011) 39(suppl 1): D1011-D1015 doi:10.1093/nar/gkq1259 Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya Hierarchical multi-label prediction of gene function Bioinformatics (2006) 22(7): 830-836 first published online January 12, 2006 doi:10.1093/bioinformatics/btk048 Seon-Young Kim and David J Volsky PAGE: Parametric Analysis of Gene Set Enrichment BMC Bioinformatics (2005) 6:144 doi:10.1186/1471-2105-6-144 Kilpinen S, Autio R, Ojala K, Iljin K, Bucher E, Sara H, Pisto T, Saarela M, Skotheim RI, Björkman M, Mpindi JP, Haapa-Paananen S, Vainio P, Edgren H, Wolf M, Astola J, Nees M, Hautaniemi S, Kallioniemi O. Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues. Genome Biol. 2008;9(9):R139. Disease signatures are robust across tissues and experiments. Dudley JT, Tibshirani R, Deshpande T, Butte AJ. Mol Syst Biol. 2009;5:307. Marion Gremse, Antje Chang, Ida Schomburg, Andreas Grote, Maurice Scheer, Christian Ebeling, and Dietmar Schomburg The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources Nucl. Acids Res. (2011) 39(suppl 1): D507-D513 first published online October 28, 2010 doi:10.1093/nar/gkq968 Tanya Barrett, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F. Kim, Maxim Tomashevsky, Kimberly A. Marshall, Katherine H. Phillippy, Patti M. Sherman, Rolf N. Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, and Alexandra Soboleva NCBI GEO: archive for functional genomics data sets—10 years on Nucl. Acids Res. (2011) 39(suppl 1): D1005-D1010 first published online November 21, 2010 doi:10.1093/nar/gkq1184 Shai S Shen-Orr, Robert Tibshirani, Purvesh Khatri, Dale L Bodian, Frank Staedtler, Nicholas M Perry, Trevor Hastie, Minnie M Sarwal, Mark M Davis and Atul J Butte Cell type–specific gene expression differences in complex tissues Nature Methods 7, 287 - 289 (2010) Published online: 7 March 2010 | doi:10.1038/nmeth.1439 Troyanskaya OG. Putting microarrays in a context: integrated analysis of diverse biological data. Brief Bioinform. 2005 Mar;6(1):34-43.
participants (1)
-
Melissa M. Lawson