[Ml-stat-talks] florian jaeger: january 7 @ 4:30pm

David Blei blei at CS.Princeton.EDU
Tue Jan 7 11:43:01 EST 2014


this computational linguistics talk looks very interesting.  (i
realize it's not "core" machine learning/statistics.)



Florian Jaeger

University of Rochester

4:30 Feb 8th (Wed)

Room 16 Joseph Henry House

Research in my lab seeks to understand how language production and
comprehension are shaped by the competing pressures inherent to
communication, and how this in turn affects the development of
language over generations. We approach these questions by drawing on
mathematical theories of communication and inference to develop
computational models that are evaluated against behavioral data (e.g.,
lab- and crowdsourcing-based experiments; spoken corpus studies;
typological data).

A sometimes under-appreciated property of human communication is that
the speech signal is both perturbed by noise and subject to systematic
variability: the statistics of the speech signal are dependent on
context (e.g., linguistic, social, visual). Critically, this includes
context types that even an adult speaker will continue to frequently
encounter novel instances of (e.g., novel speakers). During my 2012
visit to Princeton, I presented my lab's efforts to understand how
comprehenders typically overcome this noise and variability through
hierarchical inference and adaptation. Efficient prediction of the
signal (language understanding) is made possible by adapting
expectations (or, in Bayesian terms, beliefs) about not only low-level
statistics (phonetic realizations of sounds classes), but also higher
level statistics affecting lexical, semantic, and syntactic inferences
during incremental language understanding. I presented evidence how
brief exposure to a novel environment (e.g., a novel speaker) is
sufficient to override the effects of life-long experience for that
environment, suggesting that we maintain and adapt
environment-specific beliefs about linguistic distributions (Fine and
Jaeger, 2013; Fine et al., 2010, 2013; Kleinschmidt & Jaeger, 2011,
2012; Jaeger & Snider, 2013; Yildrim et al., 2013).

In this talk, I'll focus on production. This work investigates whether
the systems underlying language production are organized so as to
balance the demands inherent to production (e.g., sequential planning)
and the goal of efficient information transfer (i.e, fast and robust
inference of the intended message, incl., but not limited to,
propositional, pragmatic, and social information). As would be
expected if speakers contribute to efficient information transfer
(Jaeger, 2006, 2013; Levy & Jaeger, 2007), production preference
reflect a trade-off between prior inferrability and the quality of the
speech signal: more predictable elements tend to be more likely to be
reduced or omitted. As evidenced in both conversational speech corpora
and production experiments, this tendency seems to hold at all levels
of linguistic production (e.g., phonetics: Aylett & Turk, 2004; Buz &
Jaeger, 2013; Bell et al., 2009; Pellegrino et al., 2011; morphology:
Frank & Jaeger, 2008; Kurumada & Jaeger, 2013; syntax: Jaeger, 2010,
2011; Resnik 1996; Wasow et al., 2011).

Interestingly, more recent work has confirmed that the same preference
are reflected in the linguistic code (i.e., the lexicon and grammar)
of languages across the world (e.g., Graff and Jaeger, 2009; Maurits
et al., 2010; Piantadosi et al., 2011, 2012). I close by asking
precisely how these biases enter languages. In a series of artificial
language learning experiments, we investigated one potential answer --
that biases enter language during acquisition (Fedzechkina et al.,
2012, 2013). We found that the same biases observed in native
production make learners of a new language reshape that language
towards greater communicative efficiency. Critically, this happens
even with regard to features that are *not* present in the learners'
native language. This suggests that at least *some* properties of
languages across the world are a consequence of the *goals* of
language use: the transfer of information.

More information about the Ml-stat-talks mailing list