Skip to content

Topic: Transcription & Pitch Estimation

Overview

Automatic music transcription (AMT) and pitch estimation using neural networks and other automated technologies. Covers polyphonic instrument transcription, fundamental frequency (f0) estimation, multi-instrument transformer models, and guitar-specific tablature transcription.

Sub-topics / Concepts

Key Entities

Models & Systems

  • ../entities/basic-pitch (Spotify) — Lightweight pitch and note transcription. Architecture: harmonic stacking + CNN. Fast, intended for consumer use.
  • ../entities/mt3 (Google) — Multi-Instrument Transformer. Token-based approach treating transcription as seq2seq. Handles multiple instruments in a single model.
  • ../entities/onsets-and-frames (Google Magenta) — Piano transcription combining onset detection with frame-level note prediction. BiLSTM + CNN architecture.
  • ../entities/crepe — Convolutional Representation for Pitch Estimation. Deep CNN for monophonic pitch (f0) estimation. Frame-level predictions.
  • ../entities/bytedance-piano-transcription — ByteDance piano transcription system. High-resolution piano transcription.
  • ../entities/tabcnn — TabCNN: CNN-based guitar tablature transcription. Predicts string/fret directly from CQT spectrograms.

Datasets

  • ../entities/guitarset — Guitar dataset with hexaphonic pickup recordings, annotated with string-level transcriptions, playing technique, and more.

Sources

None ingested yet — seed batch setup.

Open Questions

  • How do MT3 and Basic Pitch compare on polyphonic instrument mixtures vs. solo piano?
  • What is the state of the art for guitar tab transcription — TabCNN vs. MT3 vs. newer approaches?
  • Can transcription models trained on isolated stems generalize to mixture inputs?
  • How much does source separation preprocessing improve downstream transcription accuracy?