Unit selection synthesis database development using utterance verification

Ingunn Amdal, Torbjørn Svendsen

Accurate annotation of the unit inventory database is of vital importance to the quality of unit selection text-to-speech synthesis. The time consuming manual work involved in database development limits the ability to produce new voices quickly and at low cost. Automatic annotation is therefore more and more in use. Misalignments due to mismatch between the predicted and pronounced unit sequence require manual correction to achieve natural sounding synthesis. This paper proposes a new annotation assessment method using log likelihood ratio based utterance verification on the recorded database. The utterance verification is applied to detect utterances where there is a likely mismatch between the predicted pronunciation and what is actually spoken, or where an automated procedure for phonemic labelling misaligns the phone labels and the acoustic content. In a fully automated procedure, utterances failing the verification test can be discarded. In semi-automatic procedures, the utterance verification can be applied to select utterances that need to be manually inspected, thereby reducing the manual effort. Preliminary experiments are presented that show promising figures for correct rejections.

doi: 10.21437/Interspeech.2005-793

