Topic: Source Separation¶
Overview¶
Audio source separation — isolating individual sound sources from mixtures. Covers deep learning approaches (Mask Inference, spectrogram prediction, waveform models), classical techniques, toolkits, and training paradigms like mixture invariant training.
Sub-topics / Concepts¶
- ../concepts/deep-clustering-separation — Hershey et al. (2016). Embedding-based separation via clustering in a learned space.
- ../concepts/mixture-invariant-training — MixIT (Wisdom, Hershey et al., Google). Training separation models on mixtures of mixtures without ground-truth isolated sources.
- ../concepts/permutation-invariant-training — PIT: key training technique for handling source permutation ambiguity in deep separation models.
- ../concepts/spectrogram-unets — U-Net architecture applied to spectrograms for mask prediction.
- ../concepts/query-based-source-separation — Conditional / text-query-driven separation.
- ../concepts/synthetic-mixing-pipelines — Data augmentation via synthetic mixture creation for training separation models.
Key Entities¶
Models¶
- ../entities/demucs / ../entities/htdemucs (Meta) — Hybrid Transformer Demucs. Waveform-domain separation with transformer layers. State-of-the-art on MUSDB18. HTDemucs adds hybrid spectrogram/waveform processing.
- ../entities/spleeter (Deezer) — Fast, pretrained separation library. U-Net based, separates into 2/4/5 stems. Widely used in production.
- ../entities/open-unmix — Open-source reference implementation for music separation. BiLSTM-based, reproducible, strong baseline on MUSDB18.
- ../entities/audiosef — AudioSep: text-queried / conditional source separation using CLAP embeddings.
- ../entities/soundfilter (Google) — SoundFilter: conditional separation via learned filter networks.
Toolkits¶
- ../entities/nussl (Northwestern / Interactive Audio Lab) — Comprehensive separation toolkit. Deep clustering, deep attractor networks, Mask Inference. Educational + research focus.
- ../entities/asteroid — PyTorch-based source separation toolkit. Modular: datasets, architectures, training recipes. Built on PyTorch-Lightning.
Datasets (see also ../topics/datasets)¶
- ../entities/musdb18 — Standard benchmark for music source separation (4 stems: drums, bass, vocals, other).
- ../entities/slakh2100 — Synthesized multi-track dataset with 2100 tracks, individual instrument stems.
Sources¶
None ingested yet — seed batch setup.
Open Questions¶
- HTDemucs vs Spleeter on non-music audio (field recordings, podcasts)?
- How well does text-conditional separation (AudioSep) work for instrument-specific queries?
- Does the MixIT paradigm remove the need for isolated-stem training data?
- What's the practical latency/throughput tradeoff for real-time separation?