Skip to content

Spectrogram U-Nets

Definition

Application of U-Net architecture to time-frequency representations (spectrograms) for audio tasks. The U-Net's encoder-decoder with skip connections is well-suited to spectrograms — it captures both local time-frequency patterns and global structure.

Key Ideas

  • Takes magnitude spectrogram as input, predicts a mask (soft mask per source) or a cleaned spectrogram.
  • Skip connections preserve fine time-frequency detail lost in the bottleneck.
  • Used in Spleeter (Deezer) for source separation and Onsets and Frames (Google Magenta) for piano transcription.
  • Operates on magnitude spectrograms — phase reconstruction via Griffin-Lim or mixture phase.

Relationships

Sources

None ingested yet — seed batch setup.