Apologies, please also see the reading list for this General Exam at:
https://www.overleaf.com/read/pzqdtnzngggv#c19ea3
From: lriehl@cs.princeton.edu On Behalf Of
gradinfo--- via talks
Sent: Tuesday, May 6, 2025 11:42 AM
To: 'talks'
Subject: [talks] David Braun will present his General Exam "EASY-AS-PIE:
Evolutionary Algorithm SYnthesizer Architecture Search via Pre-trained
Instrument Embeddings" on Thursday, May 8, 2025 at 10:00 AM in CS 402.
David Braun will present his General Exam "EASY-AS-PIE: Evolutionary
Algorithm SYnthesizer Architecture Search via Pre-trained Instrument
Embeddings" on Thursday, May 8, 2025 at 10:00 AM in CS 402.
Committee Members: Adam Finkelstein (advisor), Tri Dao, Mae Milano
Abstract:
Software synthesizers offer immense creative possibilities for sound design,
but navigation of their high-dimensional parameter spaces poses a challenge
for musicians. This has motivated efforts toward automatic synthesizer
programming-systems that generate parameter settings given a target sound or
description. However, such systems require an automated means of comparing
the similarity of sounds, which is complicated due to psychoacoustic
factors. Existing methods often address synthesizer programming using
gradient descent or genetic algorithms. Gradient-based optimization faces
challenges because synthesizer signal processing is highly recurrent and can
involve gradient discontinuities. Conversely, evolutionary methods handle
such discontinuities but typically operate with fixed synthesizer
architectures, limiting creative potential. Allowing dynamic architectural
changes exponentially increases search complexity due to the combinatorial
explosion in connecting synthesizer modules. Neither approach is well-suited
for automatic programming of a large corpus of target sounds. Nonetheless,
automatic synthesizer programming remains a valuable research direction
because synthesizers offer greater interpretability compared to black-box
neural audio synthesis models, enabling musicians and sound designers to
intuitively understand and manipulate generated sounds.
Motivated by these considerations, we propose a novel framework that
uniquely combines two established concepts: evolutionary algorithms for
synthesizer architecture search and pre-trained instrument embeddings
(specifically CLAP) for perceptual comparison of sound similarity.
Crucially, our method efficiently optimizes synthesizer presets for
thousands of distinct target sounds simultaneously.
Our evolutionary framework involves a hierarchical genetic representation
comprising distinct gene classes for oscillator settings, envelopes, various
audio effects, and more. Each parameter is modeled with dedicated numeric
genes, either continuous (e.g., an oscillator's sustain level) or discrete
(e.g., filter type), and mutations are performed by sampling from
precomputed parameter distributions extracted from real synthesizer presets.
Since the genome includes information about modulation routings (e.g.,
"Envelope 1 modulating Reverb Size with a Depth of 40%"), the genetic
mutation and crossover operations explore the synthesizer architecture space
in addition to a purely numeric parameter space.
Our evolutionary framework also incorporates two complementary archive
structures to balance exploration and exploitation. A grid-based archive
inspired by MAP-Elites uses dimensionality reduction on CLAP embeddings,
enabling structured diversity-driven exploration of the preset space,
untethered to target sounds. Concurrently, a second archive maintains
champion presets for each target sound, ranked using cosine distance metrics
in the embedding space, facilitating exploitation.
Through ablation studies, we demonstrate that this dual-archive approach
effectively improves alignment with a held-out audio dataset measured by
Kernel Audio Distance (KAD). Moreover, when optimizing for a single target
sound, our approach can flexibly use a different loss signal such as a
multi-scale spectrogram loss while still benefiting from pre-existing
archives. We show that the addition of architecture search in such a
scenario results in superior reconstruction of query sounds compared to only
modifying numeric parameters. This research demonstrates that
embedding-driven evolutionary strategies combined with flexible synthesizer
representations enable automated and diverse exploration of sounds.