8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Unit Selection Based on Voice Recognition

Yi Zhou (1), Yiqing Zu (2)

(1) Shanghai Jiaotong University, China
(2) Motorola China Research Center, China

In this paper, we describe a perceptual voice recognition method to improve the naturalness of synthesized speech for Mandarin Chinese text-to-speech (TTS) baseline system. As a large TTS speech corpus, speech data always has different acoustic properties for different data recording conditions. Speech data recorded under different conditions can finally influence the naturalness of synthesized speech. Concerning this fact, we separate the speech data in a TTS corpus into several different voice classes based on an iterative voice recognition method, which is something like speaker recognition. Among each class, speech units will be considered to have the same voice characteristics. Based on the voice recognition result, a novel unit selection algorithm is performed to select better units to synthesize a more natural-sounding speech. Primary experiment shows the possibility and validity of the method.

Full Paper

Bibliographic reference.  Zhou, Yi / Zu, Yiqing (2003): "Unit selection based on voice recognition", In EUROSPEECH-2003, 265-268.