Skip to content

Non-Negative Matrix Factorization for Audio

Definition

Application of NMF to magnitude spectrograms (Smaragdis {AMP} Brown, 2003) — decomposing a non-negative spectrogram matrix V into basis spectra W and activation matrix H such that V ≈ WH. Natural fit for polyphonic audio because the decomposition aligns with additive sound mixtures.

Key Ideas

  • Spectrogram V (frequencies × time) ≈ W (frequencies × components) × H (components × time).
  • Each component in W is a spectral template (e.g., a note, a phoneme, an instrument timbre).
  • H encodes when each template is active.
  • For separation: group components by source, reconstruct each source from its components.
  • Variants handle real-world complexity: shift-invariant NMF (frequency shifts), convolutive NMF (time shifts), sparse NMF (sparsity constraints).

Relationships

Sources

None ingested yet — seed batch setup.