Here are some personal projects that I work on during my free time, in order to explore ideas merging machine learning, audio and music.
Timbre Morpher
Timbre Morpher is a Python library for smoothly transforming audio from one timbre to another using neural audio synthesis.
How it works
The system takes two audio files as input (e.g., a piano note and a violin note) and encodes each into a compact latent representation using a variational autoencoder (VAE). These latent vectors capture the essential characteristics of each sound in a continuous, high-dimensional space. To generate the morph, the system computes a trajectory of intermediate points by linearly interpolating between the two latent representations. Each point along this trajectory is then decoded back into audio, producing a sequence of sounds that gradually transition from the source timbre to the target.
Implemented models
- RAVE (Realtime Audio Variational autoEncoder) from IRCAM, with several pretrained checkpoints available for different audio domains (classical instruments, speech, percussion)
- EnCodec from Meta for high-fidelity audio compression and reconstruction
- DAC, high-fidelity audio codec from Descript, optimized for music
Next steps
- Integrate additional neural audio codecs: AudioMAE
- Implement pitch preservation during morphing
- Add alternative interpolation methods and trajectories
- Package as a VST/AU plugin for digital audio workstations
Audio Examples
Source (piano)
Target (violin)
Morphed result (10 steps, concatenated, obtained with DAC model)
Text-to-foley
Text-to-audio generation model specialized for foley sounds. Given a text description, the model generates realistic sound effects commonly used in film and media production.
Audio Examples
"Footsteps on wooden floor, slow walking pace, indoor"
"Heavy rain on metal roof, steady downpour"
"Large window glass violently shattering, exploding into pieces, dramatic crash"
"Deep thunder rumble, powerful storm rolling thunder, dramatic boom"