11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Automatic Error Detection for Unit Selection Speech Synthesis Using Log Likelihood Ratio Based SVM Classifier

Heng Lu, Zhen-Hua Ling, Si Wei, Lirong Dai, Ren-Hua Wang

University of Science & Technology of China, China

This paper proposes a method to detect the errors in synthetic speech of a unit selection speech synthesis system automatically using log likelihood ratio and support vector machine (SVM). For SVM training, a set of synthetic speech are firstly generated by a given speech synthesis system and their synthetic errors are labeled by manually annotating the segments that sound unnatural. Then, two context-dependent acoustic models are trained using the natural and unnatural segments of labeled synthetic speech respectively. The log likelihood ratio of acoustic features between these two models is adopted to train the SVM classifier for error detection. Experimental results show the proposed method is effective in detecting the errors of pitch contour within a word for a Mandarin speech synthesis system. The proposed SVM method using log likelihood ratio between context-dependent acoustic models outperforms the SVM classifier trained on acoustic features directly.

Full Paper

Bibliographic reference.  Lu, Heng / Ling, Zhen-Hua / Wei, Si / Dai, Lirong / Wang, Ren-Hua (2010): "Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier", In INTERSPEECH-2010, 162-165.