Query-Based Source Separation¶
Definition¶
Source separation guided by a user-provided query — text description (e.g., "separate the violin"), audio example, or other conditioning signal — rather than separating into predefined stem categories.
Key Ideas¶
- Contrast with fixed-stem separation (e.g., Spleeter 4-stem: drums/bass/vocals/other). Query-based is open-vocabulary.
- Enabled by joint audio-text embeddings (CLAP) — the query is projected into the same space as audio features.
- Key systems: AudioSep (uses CLAP embeddings), SoundFilter (learned filter networks).
Relationships¶
- Builds on clap for text-audio alignment
- Related to audiosef and [[soundfilter]]
- Contrasts with ../concepts/informed-model-based-separation — query-based is open-vocabulary; informed separation uses structured side information (score, MIDI)
Sources¶
None ingested yet — seed batch setup.