Skip to content

Query-Based Source Separation

Definition

Source separation guided by a user-provided query — text description (e.g., "separate the violin"), audio example, or other conditioning signal — rather than separating into predefined stem categories.

Key Ideas

  • Contrast with fixed-stem separation (e.g., Spleeter 4-stem: drums/bass/vocals/other). Query-based is open-vocabulary.
  • Enabled by joint audio-text embeddings (CLAP) — the query is projected into the same space as audio features.
  • Key systems: AudioSep (uses CLAP embeddings), SoundFilter (learned filter networks).

Relationships

Sources

None ingested yet — seed batch setup.