<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 12pt; color: #000000'><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><style>p { margin: 0; }</style><div style="font-family: arial,helvetica,sans-serif; font-size: 12pt; color: #000000"><h1 class="page__title title" id="page-title">Feature allocations, paintboxes, and probability functions</h1><span class="event-speaker">
<a href="http://www.stat.berkeley.edu/%7Etab/" target="_blank">Tamara Broderick</a>, </span><span class="event-speaker-from"><a href="http://berkeley.edu/index.html" target="_blank">University of California, Berkeley</a>
<br>Wednesday, February 12, 4:30pm<br>Computer Science 105<br></span><br>Clustering involves placing entities into mutually exclusive
categories. We wish to relax the requirement of mutual exclusivity,
allowing objects to belong simultaneously to multiple classes, a
formulation that we refer to as "feature allocation." The first step is a
theoretical one. In the case of clustering the class of probability
distributions over exchangeable partitions of a dataset has been
characterized (via exchangeable partition probability functions and the
Kingman paintbox). These characterizations support an elegant
nonparametric Bayesian framework for clustering in which the number of
clusters is not assumed to be known a priori. We establish an analogous
characterization for feature allocation; we define notions of
"exchangeable feature probability functions" and "feature paintboxes"
that lead to a Bayesian framework that does not require the number of
features to be fixed a priori. The second step is a computational one.
Rather than appealing to Markov chain Monte Carlo for Bayesian
inference, we develop a method to transform Bayesian methods for feature
allocation (and other latent structure problems) into optimization
problems with objective functions analogous to K-means in the clustering
setting. These yield approximations to Bayesian inference that are
scalable to large inference problems.<br><br><p>Tamara Broderick is a PhD candidate in the Department of Statistics
at the University of California, Berkeley. Her research in machine<br>
learning focuses on the design and study of Bayesian nonparametric models, with particular emphasis on feature allocation as a<br>
generalization of clustering that relaxes the mutual exclusivity and
exhaustivity assumptions of clustering. While at Berkeley, she has<br>
been a National Science Foundation Graduate Student Fellow and a
Berkeley Fellowship recipient. She graduated with an AB in Mathematics
from Princeton University in 2007---with the Phi Beta Kappa Prize for
highest average GPA in her graduating class and with Highest Honors in
Mathematics. She spent the next two years on a Marshall Scholarship at
the University of Cambridge, where she received a Masters of Advanced
Study in Mathematics for completion of Part III of the Mathematical
Tripos (with Distinction) in 2008 and an MPhil by Research in Physics in
2009. She received a Masters in Computer Science from UC Berkeley in
2013.</p><br><br></div><br>_______________________________________________<br>talks mailing list<br>talks@lists.cs.princeton.edu<br>To edit subscription settings or remove yourself, use this link:<br>https://lists.cs.princeton.edu/mailman/listinfo/talks<br></div><br></div></body></html>