Deep Clustering for Source Separation¶
Definition¶
Source separation approach where a neural network learns to embed each time-frequency bin into a space where bins belonging to the same source cluster together. Separation is performed by clustering the embeddings, then using cluster assignments as masks.
Key Ideas¶
- Hershey et al. (2016). Each T-F bin gets a D-dimensional embedding.
- Training objective: minimize distance between embeddings of bins belonging to the same source, maximize distance between different sources.
- At inference: run k-means on embeddings to get cluster assignments, use as binary masks.
- Elegantly handles the permutation problem — clustering is permutation-invariant.
- Foundation for later work: deep attractor networks, anchored deep clustering.
Relationships¶
- Introduced by john-hershey
- Related to ../concepts/permutation-invariant-training — alternative solution to permutation ambiguity
- Implemented in ../entities/nussl
- Preceded ../entities/demucs-style mask inference approaches
Sources¶
None ingested yet — seed batch setup.