David Braun will present his General Exam "EASY-AS-PIE: Evolutionary Algorithm SYnthesizer Architecture Search via Pre-trained Instrument Embeddings" on Thursday, May 8, 2025 at 10:00 AM in CS 402.

David Braun will present his General Exam "EASY-AS-PIE: Evolutionary Algorithm SYnthesizer Architecture Search via Pre-trained Instrument Embeddings" on Thursday, May 8, 2025 at 10:00 AM in CS 402. Committee Members: Adam Finkelstein (advisor), Tri Dao, Mae Milano Abstract: Software synthesizers offer immense creative possibilities for sound design, but navigation of their high-dimensional parameter spaces poses a challenge for musicians. This has motivated efforts toward automatic synthesizer programming-systems that generate parameter settings given a target sound or description. However, such systems require an automated means of comparing the similarity of sounds, which is complicated due to psychoacoustic factors. Existing methods often address synthesizer programming using gradient descent or genetic algorithms. Gradient-based optimization faces challenges because synthesizer signal processing is highly recurrent and can involve gradient discontinuities. Conversely, evolutionary methods handle such discontinuities but typically operate with fixed synthesizer architectures, limiting creative potential. Allowing dynamic architectural changes exponentially increases search complexity due to the combinatorial explosion in connecting synthesizer modules. Neither approach is well-suited for automatic programming of a large corpus of target sounds. Nonetheless, automatic synthesizer programming remains a valuable research direction because synthesizers offer greater interpretability compared to black-box neural audio synthesis models, enabling musicians and sound designers to intuitively understand and manipulate generated sounds. Motivated by these considerations, we propose a novel framework that uniquely combines two established concepts: evolutionary algorithms for synthesizer architecture search and pre-trained instrument embeddings (specifically CLAP) for perceptual comparison of sound similarity. Crucially, our method efficiently optimizes synthesizer presets for thousands of distinct target sounds simultaneously. Our evolutionary framework involves a hierarchical genetic representation comprising distinct gene classes for oscillator settings, envelopes, various audio effects, and more. Each parameter is modeled with dedicated numeric genes, either continuous (e.g., an oscillator's sustain level) or discrete (e.g., filter type), and mutations are performed by sampling from precomputed parameter distributions extracted from real synthesizer presets. Since the genome includes information about modulation routings (e.g., "Envelope 1 modulating Reverb Size with a Depth of 40%"), the genetic mutation and crossover operations explore the synthesizer architecture space in addition to a purely numeric parameter space. Our evolutionary framework also incorporates two complementary archive structures to balance exploration and exploitation. A grid-based archive inspired by MAP-Elites uses dimensionality reduction on CLAP embeddings, enabling structured diversity-driven exploration of the preset space, untethered to target sounds. Concurrently, a second archive maintains champion presets for each target sound, ranked using cosine distance metrics in the embedding space, facilitating exploitation. Through ablation studies, we demonstrate that this dual-archive approach effectively improves alignment with a held-out audio dataset measured by Kernel Audio Distance (KAD). Moreover, when optimizing for a single target sound, our approach can flexibly use a different loss signal such as a multi-scale spectrogram loss while still benefiting from pre-existing archives. We show that the addition of architecture search in such a scenario results in superior reconstruction of query sounds compared to only modifying numeric parameters. This research demonstrates that embedding-driven evolutionary strategies combined with flexible synthesizer representations enable automated and diverse exploration of sounds.

Apologies, please also see the reading list for this General Exam at:
https://www.overleaf.com/read/pzqdtnzngggv#c19ea3
From: lriehl@cs.princeton.edu
participants (1)
-
gradinfo@cs.princeton.edu