ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing

ICC Jeju, Korea
October 3, 2004

Modelling of Note Events for Singing Transcription

Matti P. Ryynänen, Anssi P. Klapuri

Institute of Signal Processing, Tampere University of Technology, Finland

This paper concerns the automatic transcription of music and proposes a method for transcribing sung melodies. The method produces symbolic notations (i.e., MIDI files) from acoustic inputs based on two probabilistic models: a note event model and a musicological model. Note events are described with a hidden Markov model (HMM) using four musical features: pitch, voicing, accent, and metrical accent. The model uses these features to calculate the likelihoods of different notes and performs note segmentation. The musicological model applies key estimation and the likelihoods of two-note and three-note sequences to determine transition likelihoods between different note events. These two models form a melody transcription system with a modular architecture which can be extended with desired front-end feature extractors and musicological rules. The system transcribes correctly over 90 % of notes, thus halving the amount of errors compared to a simple rounding of pitch estimates to the nearest MIDI note.


Full Paper

Bibliographic reference.  Ryynänen, Matti P. / Klapuri, Anssi P. (2004): "Modelling of note events for singing transcription", In SAPA-2004, paper 40.