[talks] Young-suk Lee will present his Pre FPO on 4/28 at 4pm in CS 401

Nicki Gotsis ngotsis at CS.Princeton.EDU
Fri Apr 24 09:51:50 EDT 2015


Young-suk Lee will present his Pre FPO on 4/28 at 4pm in CS 401. His committee members are: Olga Troyanskaya (advisor), Barbara Engelhardt , Kai Li, Thomas Funkhouser, and John Storey (MOL). 

Everyone is invited to attend his talk.  The talk title and abstract follow below:

Targeted analyses of very large genomic data collections. 

Genome-scale experiments provide an overwhelming amount of molecular information for biologist. New computational methods are needed for specific analysis and interpretation of such high-dimensional data. Here we take advantage of the massive public repositories to quantify the tissue-specific signals in gene expression profiles, characterize distinctive molecular features of human diseases, deconvolve the latent cell-type-specific factors in mixed clinical samples, and automatically integrate heterogeneous data sources in the context of a specific genome-wide dataset. 

We present Unveiling RNA Sample Annotation (URSA) that incorporates the known tissue/cell-type relationships to better estimate the specific tissue/cell-type signal in any given gene expression profile. Our ontology-aware method combines independent discriminative classifiers in a Bayesian framework, outperforming other machine learning methods. URSA can also predict the correct tissue-type for cross-platform samples including RNASeq data without re-training URSA. Finally, we provide a molecular interpretation for the tissue and cell-type models learned by URSA, enabling a data-driven view of molecular processes specific to particular tissues and cell types. 

Complex diseases are driven by multiple genetic changes and characterized by genome-wide perturbations of cellular pathways and functions. We developed a unified framework to quantify distinctive functional and anatomical characteristics of human diseases from thousands of clinical disease-specific profiles from public repositories. Our data-driven analysis, in conjunction with known biology, can be used to repurpose drugs for rare disease with no prior genetic knowledge. 

Clinical samples are heterogeneous and consist of many different cell-types, and so genome-scale experiments for these samples result as an admixture of these cell-types and their proportions. Assuming that clinical samples share the same cell-types, we designed a Bayesian nonparametric model to deconvolve the latent cell-type-specific signal in gene expression profiles of clinical samples. Our method estimates the number of cell-types, the cell-type-specific expression profiles, and the sample-specific cell-type proportions from large clinical datasets without relying on any additional information beyond the gene expression data. 

Integration of heterogeneous genome-wide data sources has been used to generate functional networks, predict gene function, and study human disease. Most biomedical researchers have specific questions they want to answer with such integrations, and these questions are usually accompanied by a user-produced genome-scale dataset they want to analyze in the context of these big public data collections. However, currently no approach exists to enable such user-guided integration. Here we develop an automatic integration method that constructs functional networks specific to a genome-scale dataset, and show that the resulting integrations reflect the biological context of the user-provided dataset, while providing accurate functional predictions. 


More information about the talks mailing list