[Ml-stat-talks] jordan boyd-graber FPO

David Blei blei at CS.Princeton.EDU
Fri May 14 13:06:56 EDT 2010

hi ml-stat-talks

jordan boyd-graber (now at the university of maryland) is giving his
FPO (thesis defense) next wednesday at 11AM.  his talk is open to the
public.  please join us if you are interested in hierarchical bayesian
modeling, fast approximate inference algorithms, computational
linguistics, and top-notch dissertations :-)

abstract and details are below.



"Adding Linguistic Understanding to Topic Models"

Jordan Boyd-Graber
Wednesday May 19, 11:00AM
Computer Science Room 302

Topic Models have been an active area of research in recent years, and
models like latent semantic analysis, probabilistic latent semantic
indexing, and latent Dirichlet allocation (LDA) have been successfully
used for detecting opinions, finding similar images, and finding
relevant documents given a query.  However, such models make very
naive assumptions about the input data: words are unrelated to each
other and the words in a document are completely exchangeable (the
so-called bag of words model).  In this work, we present algorithms
that enhance the document-level knowledge provided by LDA with richer
linguistic assumptions.

First, we allow topic models to use words arranged in a tree (such as
the WordNet ontology) rather than a simple flat list, and we derive
inference using Markov Chain Mote Carlo (MCMC) methods for this model.
 One application that is made possible by this change is word sense
disambiguation, which discovers the meanings of words (i.e.
discriminating between "bank" the financial institution and the
landform).  We show that incorporating topics in this model improves
disambiguation accuracy.

Next, we discuss techniques that allow topic models to be applied to
multilingual data with minimal external information.  We demonstrate
two models that can discover topics that make sense across languages.

Finally, we present a model that incorporates local syntactic
information into topic models, which allows the algorithm to find
groups of words that are both globally thematically consistent and
locally syntactically consistent.  We use the product of experts model
to combine document and syntactic information and derive variational
inference for this model.  We show that this model predicts word usage
better than previous models.

More information about the Ml-stat-talks mailing list