Melissa Lawson mml at CS.Princeton.EDU
Fri Feb 13 09:14:09 EST 2009


Jordan Boyd-Graber will present his preFPO on Tuesday February 17 at 10:30 AM in Room 302.
The members of his committee are:  David Blei, advisor; Christiane Fellbaum and Ryan
(Google), readers; Rob Schapire and Dan Osherson (Psych), nonreaders.  Everyone is invited
attend his talk.  His abstract follows below.
Title: Linguistic Extensions to Topic Models

Topic Models have been an active area of research in recent years, and models like latent
semantic analysis, probabilistic latent semantic indexing, and latent Dirichlet allocation
(LDA) have been successfully used for detecting opinions, finding similar images, and
finding relevant documents given a query.  However, such models make few assumptions about
the input data: words are unrelated to each other and the words in a document are
completely exchangeable (the so-called bag of words model).

In this work, we present algorithms that enhance the document-level knowledge provided by
LDA with richer linguistic assumptions.  First, we allow topic models to use words
arranged in a tree (such as the WordNet ontology) rather than a simple flat list and
derive inference using MCMC for this model.  One application that is made possible by this
change is word sense disambiguation, which discovers the meanings of words (i.e.
discriminating between "bank" the financial institution and the landform).  We show that
incorporating topics in this model improves disambiguation accuracy.

Secondly, we present a model that incorporates local syntactic information into topic
models, which allows the algorithm to find groups of words that are both globally
thematically consistent and locally syntactically consistent.  We use the product of
experts model to combine document and syntactic information and derive variational
inference procedures for this model.  We show that this model predicts word usage better
than previous models.



