Mixture Invariant Training (MixIT)¶
Definition¶
Training paradigm for source separation that uses mixtures of mixtures (MoMs) as input instead of requiring isolated ground-truth sources. The model separates a MoM into its constituent mixtures, which are compared against the known sub-mixtures.
Key Ideas¶
- Wisdom, Hershey et al. (Google). Take two mixtures, sum them = MoM. Model separates MoM into estimates of the original mixtures.
- Loss: compare estimated outputs to original mixtures (not to isolated sources).
- Key advantage: does not require isolated stems for training. Can use any audio.
- Works because the mapping from MoM to constituent mixtures is unambiguous (no permutation problem).
- Enables training on in-the-wild data at massive scale.
Relationships¶
- From john-hershey's group at Google
- Related to ../concepts/permutation-invariant-training, ../concepts/deep-clustering-separation
- Contrast with ../concepts/synthetic-mixing-pipelines — MixIT doesn't need isolated stems
- Implemented in some ../entities/asteroid recipes
Sources¶
None ingested yet — seed batch setup.