[talks] Colloquium Speaker-This Wednesday, October 21st

Michele J. Brown mjbrown at CS.Princeton.EDU
Mon Oct 19 17:13:56 EDT 2009


Title: Hidden Grammar: Advances in Data-Driven Models of Language
Speaker: Noah Smith, Carnegie Mellon University, 
http://www.cs.cmu.edu/%7Enasmith

Date: Wednesday, October 21, 2009
Time: 4:30 PM
Host: David Blei
Location: Computer Science Auditorium, CS 105

Abstract:
With the field of computational linguistics' empirical revolution of the 
1990s came the realization that human intuitions about language are 
insufficient for accurate and robust natural language technologies. The 
move from hand-written, rule-based models to data-driven techniques led 
to huge advances, yet we still leaned on human intuition for 
constructing annotated linguistic datasets. Despite major advances in 
this paradigm (some of which we'll discuss in this talk), we now know 
that, in the wild world of real and diverse linguistic data, natural 
language technology raised on expert-made annotations remains 
insufficient for real, robust applications.

In this talk we adopt the premise that unsupervised learning will, in 
the long run, be the way forward for learning computational models of 
language cheaply. We focus on dependency syntax learning without trees, 
beginning with the classic EM algorithm and presenting several ways to 
alter EM for drastically improved performance using crudely represented 
"knowledge" of linguistic universals. We then present more recent work 
in the empirical Bayesian paradigm, where we encode our background 
knowledge as a prior over grammars, applying inference to obtain hidden 
structure. Of course, "background knowledge" is still human intuition. 
We argue, however, that by representing this knowledge compactly in a 
prior distribution--far more compactly than the many decisions made in 
building treebanks--we can experimentally explore the connection between 
proposed linguistic universals and unsupervised learning.

This talk includes discussion of joint work with Shay Cohen, Dipanjan 
Das, Jason Eisner, Kevin Gimpel, Andre Martins, and Eric Xing.


More information about the talks mailing list