Skip to content

Mixture Invariant Training (MixIT)

Definition

Training paradigm for source separation that uses mixtures of mixtures (MoMs) as input instead of requiring isolated ground-truth sources. The model separates a MoM into its constituent mixtures, which are compared against the known sub-mixtures.

Key Ideas

  • Wisdom, Hershey et al. (Google). Take two mixtures, sum them = MoM. Model separates MoM into estimates of the original mixtures.
  • Loss: compare estimated outputs to original mixtures (not to isolated sources).
  • Key advantage: does not require isolated stems for training. Can use any audio.
  • Works because the mapping from MoM to constituent mixtures is unambiguous (no permutation problem).
  • Enables training on in-the-wild data at massive scale.

Relationships

Sources

None ingested yet — seed batch setup.