CLAP¶
About¶
Contrastive Language-Audio Pretraining — joint embedding space for audio and text. Enables text-queried audio tasks: zero-shot classification, text-to-audio retrieval, and conditional source separation. Foundation for AudioSep. Trained on large-scale audio-text pairs.
Relevance¶
Enables query-based separation ("separate the banjo") via AudioSep. Key question: does CLAP's vocabulary cover bluegrass instruments well enough for useful separation?
Mentions¶
- ../entities/audiosef — uses CLAP for text-conditional separation
- ../entities/audioset — one training source
- ../concepts/query-based-source-separation — enabled by CLAP