Spectrogram U-Nets¶
Definition¶
Application of U-Net architecture to time-frequency representations (spectrograms) for audio tasks. The U-Net's encoder-decoder with skip connections is well-suited to spectrograms — it captures both local time-frequency patterns and global structure.
Key Ideas¶
- Takes magnitude spectrogram as input, predicts a mask (soft mask per source) or a cleaned spectrogram.
- Skip connections preserve fine time-frequency detail lost in the bottleneck.
- Used in Spleeter (Deezer) for source separation and Onsets and Frames (Google Magenta) for piano transcription.
- Operates on magnitude spectrograms — phase reconstruction via Griffin-Lim or mixture phase.
Relationships¶
- Used by ../entities/spleeter, ../entities/onsets-and-frames
- Contrasts with waveform-domain models like ../entities/demucs
- Related to ../concepts/sinusoidal-modeling — spectrogram vs. sinusoidal representation tradeoffs
Sources¶
None ingested yet — seed batch setup.