Skip to content

Demucs / HTDemucs

Summary

Meta's Demucs family represents the dominant open-source source separation lineage. Demucs (2019/2021) introduced competitive waveform-domain separation using a U-Net with bidirectional LSTM, reaching 6.3 dB SDR on MUSDB18. Hybrid Demucs (2021) let the model choose between spectrogram and waveform domain processing, winning the MDX 2021 competition with +1.4 dB improvement. HTDemucs (2022) replaced innermost U-Net layers with cross-domain Transformer encoders (self-attention + cross-attention across time/frequency), reaching 9.20 dB SDR with extra training data — SOTA at publication.

Key Claims

  • Waveform-domain separation can match or exceed spectrogram-domain approaches
  • Hybrid spectrogram/waveform processing outperforms either alone
  • Cross-domain Transformers (attending across time and frequency representations) improve over pure convolutional U-Nets
  • 4-stem (vocals/bass/drums/other) and 6-stem (+guitar/piano) pretrained models available

Relevance to Bluegrass

Best general-purpose separator available via pip install demucs. Critical limitation: all non-vocal/bass/drums instruments land in "other" — banjo, mandolin, and fiddle are not distinguished. The "other" stem would need further processing.

Repo archived by Meta Jan 2025 but still functional. GitHub: facebookresearch/demucs (10.1k stars).