GuitarSet: A Dataset for Guitar Transcription¶
Summary¶
GuitarSet is a landmark dataset for guitar transcription research, containing over 360 high-quality recordings with comprehensive annotations at the string, note, and technique level. Recorded at NYU's Music and Audio Research Lab (MARL), it features professional guitarists performing in multiple styles (jazz, bossa nova, rock) on a single guitar. The key innovation is the use of a hexaphonic pickup that records each string independently, providing perfect ground truth for which notes are played on which strings — essential for tablature estimation and detailed transcription tasks.
For bluegrass banjo transcription, GuitarSet is important as both a methodology template and a potential source of transfer learning. The hexaphonic pickup approach could be replicated for banjo to create a similar annotated dataset, though banjo presents unique challenges (drone string, different tuning, clawhammer vs. three-finger style). The annotation methodology — which captures playing technique (bend, slide, vibrato, hammer-on, pull-off) alongside note events — is directly relevant to bluegrass where these expressive techniques are fundamental. GuitarSet has become the standard benchmark for AMT and tablature estimation systems (used by TabCNN, Basic Pitch, and many others), establishing dataset design patterns that any banjo transcription dataset should follow.
The dataset is available on Zenodo (160 GitHub stars for the companion repository) with a permissive license, and includes companion code for visualization and annotation processing.
Key Claims¶
- 360+ recordings with hexaphonic (per-string) pickup audio and microphone audio
- Comprehensive annotations: note events, string/fret positions, playing techniques
- Multiple performance styles by professional guitarists
- Hexaphonic pickup provides perfect per-string ground truth for tablature estimation
- Standard benchmark for guitar transcription and tablature estimation research
- Companion annotation code and visualization tools released open source
- Methodology template for string instrument dataset creation
Related¶
- ../concepts/musicxml-tab-notation — annotations include string/fret information convertible to tablature notation
- ../entities/guitarset — the dataset and companion code repository
- ../entities/tabcnn — uses GuitarSet for training guitar tablature estimation
- ../entities/basic-pitch — evaluates multi-instrument transcription on GuitarSet among other datasets
- ../concepts/synthetic-mixing-pipelines — hexaphonic recordings enable creating synthetic mixtures for separation training