Permutation Invariant Training¶

Definition¶

A training paradigm for source separation models that handles the ambiguity of which output corresponds to which source. Instead of requiring a fixed source-to-output mapping, PIT computes loss over all permutations of output-to-reference assignments and uses the minimum.

Key Ideas¶

Core problem: a separation model outputs N sources, but the order is arbitrary. A fixed loss (output-1 to source-A) creates a permutation-dependent training signal.
PIT: For each training example, compute loss for all N! permutations of output-to-reference mappings. Use the permutation with minimum loss.
uPIT (utterance-level PIT): Applies PIT at the utterance level rather than frame level, reducing permutation switching artifacts.
Key enabler for deep learning-based separation — without PIT, models struggle with source permutation ambiguity.

Relationships¶

Introduced by john-hershey and colleagues
Foundational to modern separation models (Demucs, Open-Unmix)
Related to ../concepts/deep-clustering-separation — alternative approach to permutation problem via embedding + clustering

Sources¶

Yu, Kolbaek, Tan, Jensen: "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation" (ICASSP 2017) — original PIT paper
../sources/2019-11-27-demucs — uses PIT for music source separation
../sources/2022-11-15-htdemucs — HTDemucs also uses PIT