INTERSPEECH 2010
11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Phonetic Segmentation of Singing Voice Using MIDI and Parallel Speech

Minghui Dong (1), Paul Chan (1), Ling Cen (1), Haizhou Li (1), Jason Teo (2), Ping Jen Kua (2)

(1) A*STAR, Singapore
(2) Nanyang Technological University, Singapore

When analyzing singing voice signal, it is required to know the boundaries of each phonetic unit in the singing voice samples. However, due to prolonged vowels in the singing voice, it is not easy to accurately align a singing voice with the phonetic sequence of its lyrics by conventional speech recognition approach. This paper proposes a solution for the phonetic annotation of the singing voice with the provision of a MIDI file and a parallel speech recording of the lyrics. The MIDI file consisting of notation and lyric information is used to locate lyrics in the singing voice. The recording of parallel speech data is used to generate a reference phonetic annotation by forced aligning it with lyrics with a speech recognizer. The singing voice is then aligned with the speech, which has phonetic annotation, and the phonetic boundaries are mapped to the singing voice. The result shows that we are able to get an accurate annotation of phonetic boundaries in singing voice.

Full Paper

Bibliographic reference.  Dong, Minghui / Chan, Paul / Cen, Ling / Li, Haizhou / Teo, Jason / Kua, Ping Jen (2010): "Phonetic segmentation of singing voice using MIDI and parallel speech", In INTERSPEECH-2010, 2890-2893.