Topic: Datasets¶
Overview¶
Key datasets for training and evaluating audio separation, transcription, classification, and representation learning models.
Sub-topics / Concepts¶
Key Entities¶
Separation & Transcription¶
- ../entities/musdb18 — MUSDB18: Standard benchmark for music source separation. 150 tracks, 4 stereo stems. The ImageNet of source separation.
- ../entities/slakh2100 — Slakh2100: Synthesized multi-track dataset. 2100 tracks with individual instrument stems.
- ../entities/guitarset — GuitarSet: Solo guitar recordings with hexaphonic pickup, per-string audio, playing technique annotations.
Classification & Captioning¶
- ../entities/audioset — AudioSet: ~2.1M 10s YouTube clips, 527 sound event classes in hierarchical ontology.
- ../entities/audiocaps — AudioCaps: ~50K audio clips with human-written captions.
- ../entities/clotho — Clotho: ~5K clips with 5 captions each. Larger vocabulary than AudioCaps.
General Music / Multi-track¶
- ../entities/cambridge-mixing-secrets — Cambridge Music Technology library. Free multi-track recordings for mixing practice.
- ../entities/free-music-archive — FMA: 106,574 tracks, 161 genres, Creative Commons.
- ../entities/internet-archive-audio — Internet Archive audio collections. Massive, heterogeneous.
- ../entities/lakh-midi-dataset — Lakh MIDI: 176,581 MIDI files matched to Million Song Dataset.
Sources¶
None ingested yet — seed batch setup.
Open Questions¶
- What is the legal/licensing status of each dataset for commercial use?
- How well do models trained on Slakh2100 (synthetic) generalize vs. MUSDB18-trained models?
- GuitarSet is small — what data augmentation strategies work for guitar transcription?