Skip to content

CLAP

About

Contrastive Language-Audio Pretraining — joint embedding space for audio and text. Enables text-queried audio tasks: zero-shot classification, text-to-audio retrieval, and conditional source separation. Foundation for AudioSep. Trained on large-scale audio-text pairs.

Relevance

Enables query-based separation ("separate the banjo") via AudioSep. Key question: does CLAP's vocabulary cover bluegrass instruments well enough for useful separation?

Mentions