[Ml-stat-talks] ML lectures: David Smith on inference and parsing

David Mimno mimno at CS.Princeton.EDU
Tue Feb 15 16:23:45 EST 2011

Have you been watching Watson compete on Jeopardy? To answer natural
language questions, computers need to understand syntax, and to understand
syntax, they need parsers. Next week we'll be inaugurating our 
Google-sponsored Machine Learning lecture series with David Smith,
who will discuss new approaches to the problem of training parsers 
from data -- sometimes very little data. [-DM]

(For upcoming talks, see http://www.cs.princeton.edu/~mimno/mltalks.html)

WHEN: Mon Feb 21, 3:00PM

David Smith (UMass, Amherst)

Title: Efficient Inference for Declarative Approaches to Language

Much recent work in natural language processing treats linguistic
analysis as an inference problem over graphs. This development opens
up useful connections between machine learning, graph theory, and

The first part of this talk formulates syntactic dependency parsing as
a dynamic Markov random field with the novel ingredient of global
constraints. Global constraints are propagated by combinatorial
optimization algorithms, which greatly improve on collections of local
constraints. In particular, such factors enforce the constraint that
the parser's output variables must form a tree. Even with second-order
features or latent variables, which would make exact parsing
asymptotically slower or NP-hard, accurate approximate inference with
belief propagation is as efficient as a simple edge-factored parser
times a constant factor. Inference can be further sped up by ignoring
98% of the higher-order factors that do not contribute significantly
to overall accuracy.

The second part extends these models to capture correspondences among
non-isomorphic structures. When bootstrapping a parser in a
low-resource target language by exploiting a parser in a high-resource
source language, models that score the alignment and the
correspondence of divergent syntactic configurations in translational
sentence pairs achieve higher accuracy in parsing the target language.
These noisy (quasi-synchronous) mappings have further applications in
adapting parsers across domains, in learning features of the
syntax-semantics interface, and in question answering, paraphrasing,
and information retrieval.


David Smith is a Research Assistant Professor in the Computer Science
Department of the University of Massachusetts, Amherst, where he is
affiliated with the Center for Intelligent Information Retrieval. He
conducts research in inference and learning of phonology, morphology,
syntax, and semantics and in scaling up NLP techniques to applications
in information retrieval, relation extraction, and machine
translation. He holds a Ph.D. in computer science from Johns Hopkins
and an A.B. in classics from Harvard.

More information about the Ml-stat-talks mailing list