[Ml-stat-talks] matthew hoffman, tuesday 10/12, 4:00PM

David Blei blei at CS.Princeton.EDU
Sun Oct 10 17:32:47 EDT 2010

hi all

matt hoffman is presenting his "final public oral" (i.e., thesis
defense) on tuesday at 4PM.  this will be particularly interesting to
those of you who like signal processing, bayesian nonparametric
methods, graphical models, matrix factorization or variational


Final Public Oral: Probabilistic Graphical Models for the Analysis and
Synthesis of Musical Audio

Matt Hoffman

Tuesday, October 12, 4PM
Room 302, Computer Science Department


Content-based Music Information Retrieval (MIR) systems seek to
automatically extract meaningful information from musical audio
signals. This thesis applies new and exist- ing generative
probabilistic models to several content-based MIR tasks: timbral
similarity estimation, semantic annotation and retrieval, and latent
source discovery and separation.

In order to estimate how similar two songs sound to one another, we
employ a Hier- archical Dirichlet Process (HDP) mixture model to
discover a shared representation of the distribution of timbres in
each song. Comparing songs under this shared representation yields
better query-by-example retrieval quality and scalability than
previous approaches.

To predict what tags are likely to apply to a song (e.g., “rap,”
“happy,” or “driving music”), we develop the Codeword Bernoulli
Average (CBA) model, a simple and fast mixture-of-experts model.
Despite its simplicity, CBA performs at least as well as state-
of-the-art approaches at automatically annotating songs and finding to
what songs in a database a given tag most applies.

Finally, we address the problem of latent source discovery and
separation by developing two Bayesian nonparametric models, the
Shift-Invariant HDP and Gamma Process NMF. These models allow us to
discover what sounds (e.g. bass drums, guitar chords, etc.) are
present in a song or set of songs and to isolate or suppress
individual source. These models’ ability to decide how many latent
sources are necessary to model the data is particularly valuable in
this application, since it is impossible to guess a priori how many
sounds will appear in a given song or set of songs.

Once they have been fit to data, probabilistic models can also be used
to drive the synthesis of new musical audio, both for creative
purposes and to qualitatively diagnose what information a model does
and does not capture. We also adapt the SIHDP model to create new
versions of input audio with arbitrary sample sets, for example, to
create a sound file that matches a song as closely as possible by
combining spoken text.

More information about the Ml-stat-talks mailing list