Spectrogram Phases (graemephi.github.io)

🤖 AI Summary
A recent exploration responds to a viral observation that STFT phase data "looks like noise" — but with a few simple tricks you can pull structure out of it. The author shows that using a rectangular window exposes phase patterns but also creates bogus high‑frequency artifacts from boundary discontinuities; tapering the edges with a Tukey window removes those streaks. A trick from Miller Puckette for Hann windows (essentially flipping/averaging adjacent bins, implemented as X[1:-1] -= X[:-2] + X[2:]) improves phase estimates used in phase vocoders even when raw phase visuals seem unchanged. More clearly legible structure appears when you compute phase deltas by unwrapping phases across time (delta = diff(unwrapped phase)), and then normalize each bin by its expected phase advance (angle(exp(2j*pi*freq*dt))). This reveals chirps and instantaneous‑frequency‑like features that are hidden in raw phase plots. Why this matters: it shows STFT phase is not pure noise — it contains temporally coherent information useful for tasks like instantaneous frequency estimation or phase vocoding — but it’s fragile to representation choices. Practical implications for ML: don’t take derivatives through wrapped angles (atan2); compute phase deltas via complex division and consider keeping complex STFT channels rather than raw angles. Still, the ML community rarely uses STFT phase directly (EnCodec opts for real/imag STFTs at multiple window sizes), so while these representations could be useful, adoption and stability around wrapping/normalization are open challenges.
Loading comments...
loading comments...