Skip to content

Synthetic Mixing / Data Augmentation Pipelines

Definition

Creating training data for source separation by artificially mixing isolated stems with randomized gains, effects, and spatialization. Multiplies available training data and improves generalization.

Key Ideas

  • Start with isolated stems (from datasets like MUSDB18, Slakh2100, or solo recordings).
  • Apply random gains, panning, EQ, reverb (via ../entities/pedalboard, ../entities/echothief, ../entities/openair) to each stem.
  • Sum into mixtures — ground truth stems are known by construction.
  • Slakh2100 takes this further — synthesizes stems from MIDI with virtual instruments, then mixes.
  • Critical for domains with limited multi-track data (e.g., guitar transcription from GuitarSet).

Relationships

Sources

None ingested yet — seed batch setup.