Aaron Wong will present his Pre FPO Tuesday, September 16 at 9:30am in CS 402. The members of his committee are Olga Troyanskaya, advisor; Mona Singh, reader; Barbara Engelhardt, reader; Moses Charikar, non-reader; Kai Li, non-reader. The title and abstract are below. All are welcome to join. Title: Tissue- and disease-specific analysis of genome-scale data to inform biological experiments Biologists using modern experimental methods are generating a massive number of genome-scale datasets. In particular, the rate of large-scale data creation in most organisms is quickly outpacing biologists’ ability to perform detailed follow-up experiments. We address this growing disparity by developing methods to improve functional annotation coverage, methods to predict tissue specific function even though most genomic data are not resolved to specific tissues, and tools enabling biologists to make experimental discoveries. We demonstrate a method to systematically transfer experimental knowledge by identifying a gene’s “functional analogs” based on both sequence similarity and pathway partners inferred from high-throughput experimental data. Our methodology improves the coverage and accuracy of function predictions across six diverse organisms and machine learning methods. We created IMP (imp.princeton.edu), a web server that enables biologists to explore multi-species functional predictions and relationships for seven organisms, and worked with zebrafish biologists to experimentally validate gene predictions for heart left/right asymmetry. Tissue and cell-type identity lie at the core of human physiology and disease. However, we still lack tools to systematically explore the genes and interactions that shape specialized cellular functions. We develop a method that leverages large data compendia to build genome-scale tissue-specific functional maps in human by integrating thousands of genome-scale datasets. Our method can extract both functional and tissue/cell-type signals even when genomic data are not resolved for the tissue. We demonstrate the broad applicability of these networks by applying them to predict IL1B responsive genes and re-prioritize genome-wide association data. Finally, we created GIANT (giant.princeton.edu), which provides access to functional networks for 144 tissues/cell-types, and enables multi-network comparisons. Almost all functional genomic methods require biologists to pose biological questions as sets of genes of interest. However, biologists are generating and trying to make sense of massive high-throughput datasets, and their biological questions can be more precisely addressed within the biological context of their experiment. We develop a dataset-driven method for integration and search of gene expression compendia and show substantial predictive improvement in identifying functionally related co-expressed genes and experimentally similar datasets.