Non-Negative Matrix Factorization for Audio¶
Definition¶
Application of NMF to magnitude spectrograms (Smaragdis {AMP} Brown, 2003) — decomposing a non-negative spectrogram matrix V into basis spectra W and activation matrix H such that V ≈ WH. Natural fit for polyphonic audio because the decomposition aligns with additive sound mixtures.
Key Ideas¶
- Spectrogram V (frequencies × time) ≈ W (frequencies × components) × H (components × time).
- Each component in W is a spectral template (e.g., a note, a phoneme, an instrument timbre).
- H encodes when each template is active.
- For separation: group components by source, reconstruct each source from its components.
- Variants handle real-world complexity: shift-invariant NMF (frequency shifts), convolutive NMF (time shifts), sparse NMF (sparsity constraints).
Relationships¶
- Introduced by ../entities/paris-smaragdis
- See ../concepts/shift-invariant-nmf for extensions
- Contrast with ../concepts/sinusoidal-modeling — NMF uses learned bases, sinusoidal uses explicit sinusoidal bases
- Modern relevance: lightweight baseline, interpretable decomposition, sometimes used as preprocessing
Sources¶
None ingested yet — seed batch setup.