7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Speaker Recognizability Evaluation of a VoiceFont-Based Text-to-Speech System

Masaharu Sakamoto, Takashi Saito

IBM Japan Ltd., Japan

We have developed a new text-to-speech system based on the Voice- Font technology. A VoiceFont is a voice dictionary for speech synthesis that holds the acoustic and prosodic characteristics extracted from the voice corpus of a speaker. The text-to-speech system using a VoiceFont is able to synthetically mimic the voice of the donor speaker. In this paper, we evaluated speaker recognizability of the synthetic speech, which means whether the synthetic speech can be recognized as the donor speakerís voice. We conducted a subjective evaluation for five VoiceFonts and here report on the evaluation results. The results show that our text-to-speech system based on VoiceFonts can retain the acoustic and prosodic characteristics of the donor speaker and the synthetic speech can be recognized as the donor speakerís voice. Furthermore, we report on how much the spectral characteristics, phoneme duration, and pitch frequency affect speaker recognizability.


Full Paper

Bibliographic reference.  Sakamoto, Masaharu / Saito, Takashi (2002): "Speaker recognizability evaluation of a voicefont-based text-to-speech system", In ICSLP-2002, 2529-2532.