🤖 AI Summary
A researcher demonstrated a micro-experiment that factors short musical audio into two interpretable parts — a sparse, low-rate control signal that encodes performer actions, and a compact "instrument" model that captures the acoustic resonances (including room and object behavior). After overfitting a single ~12s segment, the learned instrument can be used to reconstruct the original audio and to play new performances in a browser via a WebAudio decoder controlled by hand-tracking (MediaPipe). The work shows that even very small datasets can yield playable, compressed artifacts that preserve key timbres and resonance behavior, pointing toward compact, interpretable representations and interactive on-device synthesis.
Technically, the system uses a 16-dimensional control plane varying at ~20 Hz; control vectors are convolved with a small set of attack envelopes, multiplied by white noise, then routed into a bank of resonances parameterized as damped harmonic oscillators. Outputs are scaled, passed through tanh for subtle distortion, and deformations are modeled as time-varying mixes that change routing to alternate resonances. Training uses an L1 loss on a multi-resolution STFT plus an L1 sparsity penalty on control signals, optimized with Adam for 10k iterations. The tiny runtime model (control dim 16, 16 resonances, expressivity 2, 22.05 kHz sample rate) yields ≈14% of original WAV size but imperfect reconstructions. Future work targets better perceptual losses, richer control representations, efficient JS/WebAudio implementations, and connections to differentiable physical modeling.
Loading comments...
login to comment
loading comments...
no comments yet