Skip to content

HTDemucs

About

Hybrid Transformer Demucs (HTDemucs) is Meta's follow-up to Hybrid Demucs, adding cross-domain Transformer encoders in the innermost layers of the bi-U-Net architecture. It operates in both temporal (waveform) and spectral (spectrogram) domains, using self-attention within each domain and cross-attention between them. HTDemucs achieved state-of-the-art results on MUSDB when published (9.20 dB SDR with extra training data).

Paper

Repository

  • https://github.com/facebookresearch/demucs — HTDemucs is included in the main Demucs repository (~10k+ stars)

Relevance

HTDemucs represents the current SOTA approach combining waveform processing with transformer-based long-range attention. For bluegrass source separation, the hybrid temporal/spectral processing may better capture both fast transients (mandolin, banjo) and sustained harmonic content (fiddle, dobro). The cross-domain attention mechanism could help disentangle instruments with overlapping frequency ranges common in bluegrass ensembles.

Mentions