HTDemucs¶
About¶
Hybrid Transformer Demucs (HTDemucs) is Meta's follow-up to Hybrid Demucs, adding cross-domain Transformer encoders in the innermost layers of the bi-U-Net architecture. It operates in both temporal (waveform) and spectral (spectrogram) domains, using self-attention within each domain and cross-attention between them. HTDemucs achieved state-of-the-art results on MUSDB when published (9.20 dB SDR with extra training data).
Paper¶
- ../sources/2022-11-15-hybrid-transformers-music-source-separation — original paper (Rouard, Massa, Défossez, 2022)
Repository¶
- https://github.com/facebookresearch/demucs — HTDemucs is included in the main Demucs repository (~10k+ stars)
Relevance¶
HTDemucs represents the current SOTA approach combining waveform processing with transformer-based long-range attention. For bluegrass source separation, the hybrid temporal/spectral processing may better capture both fast transients (mandolin, banjo) and sustained harmonic content (fiddle, dobro). The cross-domain attention mechanism could help disentangle instruments with overlapping frequency ranges common in bluegrass ensembles.
Mentions¶
- ../sources/2019-11-27-music-source-separation-waveform-domain — predecessor Demucs paper
- ../sources/2023-09-05-band-split-rope-transformer-music-source-separation — BS-RoFormer achieves higher SDR on MUSDB18HQ