[talks] Young-suk Lee generals

Melissa M. Lawson mml at CS.Princeton.EDU
Tue May 8 09:55:35 EDT 2012

Young-suk Lee will present his research seminar/general exam on 
Tuesday May 15 at 2PM in Room 401 (note room!).  The members of 
his committee are:  Olga Troyanskaya (advisor), Mona Singh, 
and David Blei.  Everyone is invited to attend his talk and those 
faculty wishing to remain for the oral exam following are welcome 
to do so.  His abstract and reading list follow below.

A microarray experiment measures the abundance of thousands of
transcript in a given biological sample in order to quantify its
unique transcriptome.  Many research groups and institutions have used
microarrays, and have been publishing them and their associated sample
description on the GEO (Gene Expression Omnibus) database website.
The most valuable biological sample can arguably be human tissue
samples, and so most human microarray studies have been so far limited
to few experiments for certain tissue-types.  GEO has made these human
microarray data publicly available, but the free-text sample
description hinders a large-scale tissue-specific microarray analysis.

We present a hierarchical multi-label tissue prediction algorithm that
returns a rank of predicted tissue-types for a given microarray data.
This algorithm may be used to annotate the many human microarrays in
GEO in which their tissue information is hidden or even absent.  We
propose that all tissue prediction algorithms must return a rank
because most, if not all, biological samples consist of multiple
tissue types that have a hierarchical order. So even a single accurate
prediction may not completely describe the biological sample.  We
compare the performance of prediction algorithms with and without
hierarchical information, and our algorithm that uses Bayesian
correction to combine multiple tissue classifiers.  In the biological
community, this algorithm may be used as a comprehensive background or
sanity-check on new human microarray datasets that measures potential
contamination and tissue composition of the biological sample.

Reading list

Introduction to Machine Learning (Adaptive Computation and Machine
Learning) by Ethem Alpaydin

Ch 1 Introduction
Ch 2 Supervised Learning
Ch 3 Bayesian Decision Theory
Ch 4 Parametric Methods
Ch 5 Multivariate Methods
Ch 8 Nonparametric Methods
Ch 10 Linear Discrimination
Ch 14 Assessing and Comparing Classification Algorithms
Ch 15 Combining Multiple Learners


Matthew N. McCall, Karan Uppal, Harris A. Jaffee, Michael J. Zilliox,
and Rafael A. Irizarry
The Gene Expression Barcode: leveraging public data repositories to
begin cataloging the human and murine transcriptomes
Nucl. Acids Res. (2011) 39(suppl 1): D1011-D1015 doi:10.1093/nar/gkq1259

Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya
Hierarchical multi-label prediction of gene function
Bioinformatics (2006) 22(7): 830-836 first published online January
12, 2006 doi:10.1093/bioinformatics/btk048

Seon-Young Kim and David J Volsky
PAGE: Parametric Analysis of Gene Set Enrichment
BMC Bioinformatics (2005) 6:144 doi:10.1186/1471-2105-6-144

Kilpinen S, Autio R, Ojala K, Iljin K, Bucher E, Sara H, Pisto T,
Saarela M, Skotheim RI, Björkman M, Mpindi JP, Haapa-Paananen S, Vainio
P, Edgren H, Wolf M, Astola J, Nees M, Hautaniemi S, Kallioniemi O.
Systematic bioinformatic analysis of expression levels of 17,330 human
genes across 9,783 samples from 175 types of healthy and pathological
Genome Biol. 2008;9(9):R139.

Disease signatures are robust across tissues and experiments.
Dudley JT, Tibshirani R, Deshpande T, Butte AJ.
Mol Syst Biol. 2009;5:307.

Marion Gremse, Antje Chang, Ida Schomburg, Andreas Grote, Maurice
Scheer, Christian Ebeling, and Dietmar Schomburg
The BRENDA Tissue Ontology (BTO): the first all-integrating ontology
of all organisms for enzyme sources
Nucl. Acids Res. (2011) 39(suppl 1): D507-D513 first published online
October 28, 2010 doi:10.1093/nar/gkq968

Tanya Barrett, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux,
Carlos Evangelista, Irene F. Kim, Maxim Tomashevsky, Kimberly A.
Marshall, Katherine H. Phillippy, Patti M. Sherman, Rolf N. Muertter,
Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, and Alexandra
NCBI GEO: archive for functional genomics data sets—10 years on
Nucl. Acids Res. (2011) 39(suppl 1): D1005-D1010 first published
online November 21, 2010 doi:10.1093/nar/gkq1184

Shai S Shen-Orr, Robert Tibshirani, Purvesh Khatri, Dale L Bodian,
Frank Staedtler, Nicholas M Perry, Trevor Hastie, Minnie M Sarwal,
Mark M Davis and Atul J Butte
Cell type–specific gene expression differences in complex tissues
Nature Methods 7, 287 - 289 (2010)
Published online: 7 March 2010 | doi:10.1038/nmeth.1439

Troyanskaya OG.
Putting microarrays in a context: integrated analysis of diverse
biological data.
Brief Bioinform. 2005 Mar;6(1):34-43.

More information about the talks mailing list