Bayesian Nonparametric Models and "Big Data"
John Paisley,
University of California, Berkeley
Monday, February 25, 2013, 4:30pm
Computer Science Small Auditorium, Room 105
Bayesian nonparametrics is an area in machine learning in which models
grow in size and complexity as data accrue. As such, they they are
particularly relevant to the world of "Big Data", where it may be
difficult or even counterproductive to fix the number of parameters a
priori. A stumbling block for Bayesian nonparametrics has been that
their algorithms for posterior inference generally show poor
scalability. In this talk, we tackle this issue in the domain of
large-scale text collections. Our model is a novel tree-structured model
in which documents are represented by collections of paths in an
infinite-dimensional tree. We develop a general and efficient
variational inference strategy for learning such models based on
stochastic optimization, and show that with this combination of modeling
and inference approach, we are able to learn high-quality models using
millions of documents.
John Paisley received the B.S.E. (2004), M.S. (2007) and Ph.D. (2010) in
Electrical & Computer Engineering from Duke University, where his
advisor was Lawrence Carin. He was a postdoctoral researcher with David
Blei in the Computer Science Department at Princeton University, and
currently with Michael Jordan in the Department of EECS at UC Berkeley.
He works on developing Bayesian models for machine learning
applications, particularly for dictionary learning and topic modeling.