Matt Hoffman will present his preFPO on Tuesday July 14 at 2PM in Room 402. The members of his committee are: Perry Cook, advisor; David Blei and Ken Steiglitz, readers; Rob Schapire and Adam Finkelstein, nonreaders. Everyone is invited to attend his talk. His abstract follows below. -------------------------------------------- Probabilistic Graphical Models for the Analysis and Synthesis and of Musical Audio Content-based Music Information Retrieval (MIR) systems seek to automatically extract meaningful information from musical audio signals. This thesis applies new and existing generative probabilistic models to several content-based MIR tasks: timbral similarity estimation, semantic annotation and retrieval, and latent source discovery and separation. In order to estimate how similar two songs sound to one another, we employ a Hierarchical Dirichlet Process (HDP) mixture model to discover a shared representation of the distribution of timbres in each song. Comparing songs under this shared representation yields better query-by-example retrieval quality and scalability than previous approaches. To predict what tags are likely to apply to a song (e.g., "rap," "happy," or "driving music"), we develop the Codeword Bernoulli Average (CBA) model, a simple and fast mixture-of-experts model. Despite its simplicity, CBA performs at least as well as state-of-the-art approaches at automatically annotating songs and finding to what songs in a database a given tag most applies. Finally, we extend the HDP to discover the latent sonic sources (e.g. bass drums, guitar chords, etc.) that are present in sets of songs and to allow the isolation or suppression of individual sources. The ability of our Shift-Invariant HDP (SIHDP) to decide how many latent sources are necessary to model the data is particularly valuable in this application, since it is impossible to guess a priori how many sounds will appear in a given song or set of songs. Once they have been fit to data, probabilistic models can also be used to drive the synthesis of new musical audio, both for creative purposes and to qualitatively diagnose what information a model does and does not capture. If our model works on a feature representation of audio, then we need techniques to produce audio characterized by arbitrary feature vectors. In order to address this problem, we develop Feature-Based Synthesis (FBS), a general framework for automatically finding synthesizer parameters that will reproduce arbitrary perceptual features. FBS also allows us to create "non-phonorealistic" renditions of songs that match some but not all of a song's perceptual characteristics. Along similar lines, we can adapt the SIHDP model to create new versions of input audio with arbitrary sample sets, for example, to create a sound file that matches a song as closely as possible by combining spoken text.
participants (1)
-
Melissa Lawson