This paper concerns the automatic transcription of music and proposes a method for transcribing sung melodies. The method produces symbolic notations (i.e., MIDI files) from acoustic inputs based on two probabilistic models: a note event model and a musicological model. Note events are described with a hidden Markov model (HMM) using four musical features: pitch, voicing, accent, and metrical accent. The model uses these features to calculate the likelihoods of different notes and performs note segmentation. The musicological model applies key estimation and the likelihoods of two-note and three-note sequences to determine transition likelihoods between different note events. These two models form a melody transcription system with a modular architecture which can be extended with desired front-end feature extractors and musicological rules. The system transcribes correctly over 90 % of notes, thus halving the amount of errors compared to a simple rounding of pitch estimates to the nearest MIDI note.
Cite as: Ryynänen, M.P., Klapuri, A.P. (2004) Modelling of note events for singing transcription. Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004), paper 40
@inproceedings{ryynanen04_sapa, author={Matti P. Ryynänen and Anssi P. Klapuri}, title={{Modelling of note events for singing transcription}}, year=2004, booktitle={Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004)}, pages={paper 40} }