Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Perception of Synthesized Singing Voices with Fine Fluctuations in Their Fundamental Frequency Contours

Masato Akagi, Hironori Kitakaze

Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, Japan

This paper demonstrates the importance of fine fluctuations quantitatively by measuring the detection thresholds of fine fluctuations in singing-voice F0s, in which voice quality is particularly important. We analyzed the fine fluctuations left by subtracting the melody and vibrato components from estimated F0s, focusing on the modulation frequency (MF) and modulation amplitude (MA). To test a hypothesis that the fine fluctuations in the F0 of singing voices affect the perception of quality and that the magnitude of this effect depends on the MF and MA, we performed four psychoacoustic experiments using synthesized stimuli. The experimental results indicate that our hypothesis was correct, and suggest that, to produce high-quality synthesized speech, one should extract F0s containing fine fluctuations with an MF of over 7 Hz in the analysis and add not only melody and vibrato but also fine fluctuation components to the F0 contours in the synthesis.

Full Paper

Bibliographic reference.  Akagi, Masato / Kitakaze, Hironori (2000): "Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours", In ICSLP-2000, vol.3, 458-461.