Young Lee will present his FPO, "Targeted analyses of very large genome-wide data collections" on Tuesday, 3/1/2016 at 10:30am in CS 402

Young Lee will present his FPO, "Targeted analyses of very large genome-wide data collections" on Tuesday, 3/1/2016 at 10:30am in CS 402 The members of his committee are Olga Troyanskaya (adviser), readers: Mona Singh and Barbara Engelhardt; nonreaders: John Storey (Lewis-Sigler Institute for Integrative Genomics) and Tom Funkhouser. A copy of his thesis, is available in Room 310. Everyone is invited to attend his talk. The talk title and abstract follow below. "Targeted analyses of very large genome-wide data collections" Abstract: Genome-scale experiments provide an overwhelming amount of molecular information for biologist. New computational methods are needed for specific analysis and interpretation of such high-dimensional data. Here we take advantage of the massive public repositories to quantify the tissue-specific signals in gene expression profiles, characterize distinctive molecular features of human diseases, deconvolve the latent cell-type-specific factors in mixed clinical samples, and automatically integrate heterogeneous data sources in the context of a specific genome-wide dataset. First, we describe URSA (Unveiling RNA Sample Annotation) that incorporates the known tissue/cell-type relationships to better estimate the specific signal in any given gene expression profile. Our ontology-aware method combines independent discriminative classifiers in a Bayesian framework, outperforming other machine learning methods. We provide a molecular interpretation for the tissue and cell-type models learned by URSA, enabling a data-driven view of molecular processes specific to particular tissues and cell types. Then, we extend this work for human diseases. We use thousands of clinical disease-specific expression profiles in public repositories to quantify distinctive functional and anatomical characteristics of human diseases. Through our data-driven analysis, we explore the complexity of the human disease landscape and propose exploratory hypothesis for drug repurposing even for rare disease with no prior genetic knowledge. Lastly, we describe YETI (Your Evidence Tailored Integration) for targeted integration of heterogeneous genome-wide data sources. Biomedical researchers generate genome-wide datasets for data-driven exploration of specific questions but such analyses are disconnect from big public data collections. YETI is the first automatic integration method that effectively constructs functional networks specific to a genome-scale dataset. We show that the resulting integration reflect the biological context of the user-provided dataset while providing accurate prediction for functional interactions.
participants (1)
-
Nicki Gotsis