Musical mel transform in Torch for Music AI (github.com)

🤖 AI Summary
musical-mel-transform is a new PyTorch package that builds a mel-frequency transform explicitly aligned to musical intervals (semitones or fractional semitones) and optimized for realtime and ONNX-friendly workflows. Instead of the usual log-spaced mel bins that don’t map to Western pitch or struggle at low frequencies, this transform interpolates and adaptively widens weighted combinations of FFT bins so each mel bin corresponds more closely to musical pitches (including quarter tones). The result is much finer low-frequency resolution—helpful for bass-heavy pop/electronic music, transcription, and note discrimination—while remaining fast and exportable for production (optional convolutional FFT for ONNX compatibility). Technically, each mel bin is a weighted sum of linear FFT bins with parameters like interval (semitone spacing), min_bins, adaptive sizing, passthrough_cutoff_hz (let high frequencies pass as grouped FFT bins), passthrough_grouping_size, power/to_db, and optional learnable_weights ("fft" to reweight FFT bins before summing or "mel" after). It recommends FFT sizes >=512 (2048 typical) since resolution is limited by the FFT itself. Trade-offs: native torch FFT is faster; conv-FFT enables ONNX export when complex FFTs aren’t supported. The package includes demos, visualization, tests and an example classifier with ONNX export, making it a practical drop-in for music ML pipelines that need musically meaningful, deployable audio features.
Loading comments...
loading comments...