Matt Hoffman will present his preFPO on Tuesday July 14 at 2PM in Room 402. The members
of
his committee are: Perry Cook, advisor; David Blei and Ken Steiglitz, readers; Rob
Schapire
and Adam Finkelstein, nonreaders. Everyone is invited to attend his talk. His abstract
follows below.
--------------------------------------------
Probabilistic Graphical Models for the Analysis and Synthesis and of
Musical Audio
Content-based Music Information Retrieval (MIR) systems seek to
automatically extract meaningful information from musical audio
signals. This thesis applies new and existing generative probabilistic
models to several content-based MIR tasks: timbral similarity
estimation, semantic annotation and retrieval, and latent source
discovery and separation.
In order to estimate how similar two songs sound to one another, we
employ a Hierarchical Dirichlet Process (HDP) mixture model to
discover a shared representation of the distribution of timbres in
each song. Comparing songs under this shared representation yields
better query-by-example retrieval quality and scalability than
previous approaches.
To predict what tags are likely to apply to a song (e.g., "rap,"
"happy," or "driving music"), we develop the Codeword Bernoulli
Average (CBA) model, a simple and fast mixture-of-experts model.
Despite its simplicity, CBA performs at least as well as
state-of-the-art approaches at automatically annotating songs and
finding to what songs in a database a given tag most applies.
Finally, we extend the HDP to discover the latent sonic sources (e.g.
bass drums, guitar chords, etc.) that are present in sets of songs and
to allow the isolation or suppression of individual sources. The
ability of our Shift-Invariant HDP (SIHDP) to decide how many latent
sources are necessary to model the data is particularly valuable in
this application, since it is impossible to guess a priori how many
sounds will appear in a given song or set of songs.
Once they have been fit to data, probabilistic models can also be used
to drive the synthesis of new musical audio, both for creative
purposes and to qualitatively diagnose what information a model does
and does not capture. If our model works on a feature representation
of audio, then we need techniques to produce audio characterized by
arbitrary feature vectors. In order to address this problem, we
develop Feature-Based Synthesis (FBS), a general framework for
automatically finding synthesizer parameters that will reproduce
arbitrary perceptual features. FBS also allows us to create
"non-phonorealistic" renditions of songs that match some but not all
of a song's perceptual characteristics. Along similar lines, we can
adapt the SIHDP model to create new versions of input audio with
arbitrary sample sets, for example, to create a sound file that
matches a song as closely as possible by combining spoken text.