Gregory Gundersen will present his FPO "Practical Algorithms for Latent Variable Models" on Friday, May 28, 2021 at 2:30PM via Zoom

 

Zoom link: https://princeton.zoom.us/j/92503349718

 

The members of Greg’s committee are as follows: Barbara Engelhardt (Adviser), Readers: Tom Griffiths and Brandon Stewart, Examiners: Barbara Engelhardt, Ryan Adams, Jonathan Pillow

 

A copy of his thesis is available upon request. Please email jfarquer@cs.princeton if you would like a copy of the thesis.

 

Everyone is invited to attend the talk.

 

Abstract:

Latent variables allow researchers and engineers to encode assumptions into their sta[1]tistical models. A latent variable might, for example, represent an unobserved covariate, measurement error, or a missing class label. Inference is challenging because one must account for the conditional dependence structure induced by these variables, and marginal[1]ization is often intractable. In this thesis, I present several practical algorithms for inferring latent structure in probabilistic models used in computational biology, neuroscience, and time-series analysis. First, I present a multi-view framework that combines neural networks and probabilistic canonical correlation analysis to estimate shared and view-specific latent structure of paired samples of histological images and gene expression levels. The model is trained end-to-end to estimate all parameters simultaneously, and we show that the latent variables capture interpretable structure, such as tissue-specific and morphological variation. Next, I present a family of nonlinear dimension-reduction models that use random features to support non[1]Gaussian data likelihoods. By approximating a nonlinear relationship between the latent variables and observations with a function that is linear with respect to random features, we induce closed-form gradients of the posterior distribution with respect to the latent variables. This allows for gradient-based nonlinear dimension-reduction models for a variety of data likelihoods. Finally, I discuss lowering the computational cost of online Bayesian filtering of time series with abrupt changes in structure, called changepoints. We consider settings in which a time series has multiple data sources, each with an associated cost. We trade the cost of a data source against the quality or fidelity of that source and how its fidelity affects the estimation of changepoints. Our framework makes cost-sensitive decisions about which data source to use based on minimizing the information entropy of the posterior distribution over changepoints.