The members of her committee are as follows: Olga Troyanskaya (adviser); Readers: Wendell Lim (UCSF) and Kai Li; Examiners: Yibin Kang (Mol Bio), Mona Singh, and Olga Troyanskaya.
Large-scale genomic studies now give more predictive power than ever, allowing us to profile the composition of tissues, study cellular functions, and understand organismal traits at an unprecedented level of detail. This is particularly important for studying heterogeneous diseases, such as cancer, where small patient-specific differences play critical roles in disease development and progression. As the these studies accumulate, it is increasingly important to develop methods to discover novel biology while considering tissue and cell type specificity, and develop systems to help make this data explosion easily manageable, accessible, and interpretable. Towards these goals, in this dissertation, we build off the wealth of publicly available data to examine the interplay between cancer and the immune system, then develop two query-based visualization systems that enable interactive data exploration for the wider biomedical community.
The first part of this work presents two perspectives on cancer and the immune system. Using derived immune markers we found that estrogen receptor activity and genomic complexity are key factors driving changes in breast cancer lymphocytic infiltration. Our method enabled discoveries on existing samples even when this was not the original intent of the study, without the need for additional experiments. Next, we leveraged public expression data to further the development of targeted immunotherapeutics for solid tumors. Working closely with experimental collaborators, we developed a method to prioritize pairs of antigen targets that will help engineered T cells hone in on tumor targets while minimizing damage to healthy tissues.
The second half covers how we can extract unbiased signals from large collections of biomedical data in the form of abstracts and repositories of transcriptomics data. We develop a method to obtain informative tissue-disease-gene relationships from abstracts and integrate them into a system that presents different snapshots of curated interactions. Next, we extend a gene expression search engine that simultaneously returns coexpressed genes and relevant datasets. Our extension expands the search space across the major model organisms and provides a new cross-organism exploration interface to help facilitate translational research. Both systems will help experimentalists leverage existing knowledge to better explain the larger implications of their specific findings.