Demucs / HTDemucs¶
Summary¶
Meta's Demucs family represents the dominant open-source source separation lineage. Demucs (2019/2021) introduced competitive waveform-domain separation using a U-Net with bidirectional LSTM, reaching 6.3 dB SDR on MUSDB18. Hybrid Demucs (2021) let the model choose between spectrogram and waveform domain processing, winning the MDX 2021 competition with +1.4 dB improvement. HTDemucs (2022) replaced innermost U-Net layers with cross-domain Transformer encoders (self-attention + cross-attention across time/frequency), reaching 9.20 dB SDR with extra training data — SOTA at publication.
Key Claims¶
- Waveform-domain separation can match or exceed spectrogram-domain approaches
- Hybrid spectrogram/waveform processing outperforms either alone
- Cross-domain Transformers (attending across time and frequency representations) improve over pure convolutional U-Nets
- 4-stem (vocals/bass/drums/other) and 6-stem (+guitar/piano) pretrained models available
Relevance to Bluegrass¶
Best general-purpose separator available via pip install demucs. Critical limitation: all non-vocal/bass/drums instruments land in "other" — banjo, mandolin, and fiddle are not distinguished. The "other" stem would need further processing.
Repo archived by Meta Jan 2025 but still functional. GitHub: facebookresearch/demucs (10.1k stars).
Related¶
- ../entities/demucs — entity page
- ../concepts/spectrogram-unets — U-Net architecture family
- ../entities/bs-roformer — current SOTA, outperforms HTDemucs
- ../entities/spleeter — alternative, lower quality but faster