Gregory Gundersen will present his Pre FPO "Tractable Inference for Latent Variable Models with Applications to Science and Engineering" on Tuesday, February 23, 2021 at 3:30pm via Zoom.

Zoom link: https://princeton.zoom.us/j/96395535605

Committee: Barbara Engelhardt, Ryan Adams, and Jonathan Pillow (examiners); and Tom Griffiths and Brandon Stewart (readers)

Title: Tractable Inference for Latent Variable Models with Applications to Science and Engineering

Abstract:
Latent variables allow practitioners to encode assumptions into their statistical models. A latent variable might, for example, represent an unobserved covariate, measurement error, or a missing class label. Inference is challenging because one must account for the conditional dependence structure induced by these variables, and marginalization is often intractable. In this talk, I present several practical algorithms for inferring latent structure in models used in neuroscience, computational biology, and time-series analysis.

First, I present a family of nonlinear dimension-reduction models that use random features to support non-Gaussian data likelihoods. Approximating a nonlinear relationship between the latent variables and observations with a function that is linear with respect to random features induces closed-form gradients of the posterior distribution with respect to the latent variables. This allows for gradient-based nonlinear dimension-reduction models for a variety of data likelihoods. I discuss results on text and image datasets, as well as neural spike trains. Next, I present a multi-view framework that combines neural networks and probabilistic CCA to estimate shared and view-specific latent structure of paired samples of histological images and gene expression levels. The model is trained end-to-end to estimate all parameters simultaneously, and I show that the latent variables capture interpretable hidden structure, such as tissue-specific and morphological variation. Finally, I discuss lowering the computational cost of online Bayesian filtering of time series with abrupt changes in structure, called changepoints. In time series with multiple data sources and associated costs, one can trade the cost of collecting an observation against the quality or "fidelity" of the measurement and how this fidelity affects the estimation of changepoints. Our framework makes cost-sensitive decisions about which data fidelity to use based on maximizing information gain with respect to the posterior distribution over changepoints. I present results on streaming video and audio data.

I conclude by discussing several lines of work extending these projects.