 
            Andrew Jones will present his FPO "Probabilistic models for structured biomedical data" on Friday, December 16, 2022 at 9:30 AM in COS 402 and Zoom. Location: Zoom link: https://princeton.zoom.us/j/95479201507 <https://www.google.com/url?q=https://princeton.zoom.us/j/95479201507&sa=D&s ource=calendar&ust=1669648765658473&usg=AOvVaw0W0kAo0lJDADvaWIY_Kr4D> The members of Andrew's committee are as follows: Examiners: Barbara Engelhardt (Adviser), Ben Raphael, Adji Bousso Dieng Readers: Jonathan Pillow, Olga Russakovsky A copy of his thesis will be available, upon request, two weeks before the FPO. Please email gradinfo@cs.princeton.edu mailto:gradinfo@cs.princeton.edu if you would like a copy of the thesis. Everyone is invited to attend his talk. Abstract follows below: Modern biomedical datasets-from molecular measurements of gene expression to pathology images-hold promise for discovering new therapeutics and probing basic questions about the behavior of cells. Thoughtful statistical modeling of these complex, high-dimensional data is crucial to elucidate robust scientific findings. A common assumption in data analysis that the data samples are independent and identically distributed. However, this assumption is nearly always violated in practice. This is especially true in the setting of biomedical data, which often exhibit some amount of structure, such as subgroups of patients, cells, or tissue types or other correlation structure among the samples. In this body of work, I propose data analysis and experimental design frameworks to account for several types of highly-structured biomedical data. These approaches, which take the form of Bayesian models and associated inference algorithms, are specifically tailored for datasets with group structure, multiple data modalities, and spatial organization of samples. In the first line of work, I propose a model for contrastive dimension reduction that decomposes the sources of variation in samples that belong to case and control conditions. Second, I propose a computational framework for aligning spatially-resolved genomics data into a common coordinate system that accounts for spatial correlation among the samples and models multiple data modalities. Finally, I propose a family of methods for optimally designing spatially-resolved genomics experiments that is tailored to the highly-structured data collection process of these studies. Together, this body of work advances the field of biomedical data analysis by developing models that directly exploit common types of structure within these data and demonstrating the advantage of these modeling approaches across an array of data types.