[talks] Colloquium Speaker-Wednesday, October 21, 2009
Michele J. Brown
mjbrown at CS.Princeton.EDU
Fri Oct 16 10:57:17 EDT 2009
Title: Hidden Grammar: Advances in Data-Driven Models of Language
Speaker: Noah Smith, Carnegie Mellon University,
http://www.cs.cmu.edu/%7Enasmith
Date: Wednesday, October 21, 2009
Time: 4:30 PM
Host: David Blei
Location: Computer Science Auditorium, CS 105
Abstract:
With the field of computational linguistics' empirical revolution of
the 1990s came the realization that human intuitions about language are
insufficient for accurate and robust natural language technologies. The
move from hand-written, rule-based models to data-driven techniques led
to huge advances, yet we still leaned on human intuition for
constructing annotated linguistic datasets. Despite major advances in
this paradigm (some of which we'll discuss in this talk), we now know
that, in the wild world of real and diverse linguistic data, natural
language technology raised on expert-made annotations remains
insufficient for real, robust applications.
In this talk we adopt the premise that unsupervised learning will, in
the long run, be the way forward for learning computational models of
language cheaply. We focus on dependency syntax learning without trees,
beginning with the classic EM algorithm and presenting several ways to
alter EM for drastically improved performance using crudely represented
"knowledge" of linguistic universals. We then present more recent work
in the empirical Bayesian paradigm, where we encode our background
knowledge as a prior over grammars, applying inference to obtain hidden
structure. Of course, "background knowledge" is still human intuition.
We argue, however, that by representing this knowledge compactly in a
prior distribution--far more compactly than the many decisions made in
building treebanks--we can experimentally explore the connection between
proposed linguistic universals and unsupervised learning.
This talk includes discussion of joint work with Shay Cohen, Dipanjan
Das, Jason Eisner, Kevin Gimpel, Andre Martins, and Eric Xing.
More information about the talks
mailing list