Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes

Motoyuki Suzuki, Sho Tomita, Tomoki Morita


Lyrics recognition from singing voice is one of the most important techniques for query-by-singing music information retrieval systems. Lyrics information realizes a higher retrieval performance than retrieval using only melody information.

However, recognizing a song lyrics from singing voice is very difficult. In order to improve recognition, a new method focused on correspondence between voice and notes has been proposed. Note boundary scores are calculated for each frame, and these values are included in feature vectors by expanding their dimensions. The marker HMM is defined to correspond to feature vectors located at note boundaries, and the marker HMM is inserted among all morae in a pronunciation dictionary. As a result, the recognizer restricts an individual mora to correspond to only one note.

We also modified the marker HMM in order to account for short pauses in a particular position. A short pause corresponding to a musical rest or breath may occur after any morae, even if inside a word. The short pause HMM is concatenated to the marker HMM, and a skip transition arc of the short pause HMM is also introduced.

From experimental results, the proposed model provided higher word accuracy than the baseline model. It improved word accuracy from 85.71% to 93.18%, which means that 52.3% of the word error rate decreased. Insertion errors, especially, were drastically suppressed.


 DOI: 10.21437/Interspeech.2019-1318

Cite as: Suzuki, M., Tomita, S., Morita, T. (2019) Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes. Proc. Interspeech 2019, 3238-3241, DOI: 10.21437/Interspeech.2019-1318.


@inproceedings{Suzuki2019,
  author={Motoyuki Suzuki and Sho Tomita and Tomoki Morita},
  title={{Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3238--3241},
  doi={10.21437/Interspeech.2019-1318},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1318}
}