[talks] Aaron Wong will present his FPO on Monday, Feb. 9, 2015 at 10:30am in CS 301

Nicki Gotsis ngotsis at CS.Princeton.EDU
Mon Feb 2 13:22:16 EST 2015


Aaron Wong will present his FPO on Monday, Feb. 9, 2015 at 10:30am in CS 301

The members of his committee are: Non-readers: Moses Charikar, Kai Li; Readers: Barbara Engelhardt, Mona Singh

A copy of his thesis is available in Room 310.  Everyone is invited to attend his talk.

ABSTRACT
Biologists using modern experimental methods are generating a massive number of genomescale
datasets. In particular, the rate of large-scale data creation in most organisms is quickly
outpacing biologists’ ability to perform detailed follow-up experiments. Thus a substantial gap
exists between the massive data being generated and the comparatively small number of
experimental validations being performed (i.e. biology knowledge). In this manuscript, we
present four solutions that broadly address this growing disparity, focusing on disease- and
tissue-specific genomic analysis. These solutions are unified by their approaches to this problem:
by combining and integrating available public genome-wide measurements to enable biological
discoveries that would otherwise be impossible. First, we demonstrate a method to systematically
transfer experimental knowledge between organisms inferred from high-throughput experimental
data. By leveraging functional genomic data, we can improve the coverage and accuracy of
function predictions across diverse organisms and machine learning methods. Second, we
present an interactive web server that addresses the needs of biologists to visualize their
experimental results in the context of multi-species functional predictions and relationships.
Third, we describe a method that, for the first time, leverages large data compendia to build
genome-scale tissue-specific functional maps in human by integrating thousands of genomescale
datasets. Our method can extract both functional and tissue/cell-type signals even when
genomic data are not resolved for the tissue and very little are known about the expression of
genes in the tissue. Finally, we detail a method for biologists to analyze their genome-scale
datasets in the context of the massive public data compendium. Biologists are generating and
trying to make sense of massive high-throughput datasets, and their biological questions can be
more precisely addressed within the biological context of their experiment. By incorporating
their experimental results in the search and integration of gene expression compendia, we
demonstrate improved predictive performance in identifying additional functionally related
genes.



More information about the talks mailing list