This paper proposes a method to detect the errors in synthetic speech of a unit selection speech synthesis system automatically using log likelihood ratio and support vector machine (SVM). For SVM training, a set of synthetic speech are firstly generated by a given speech synthesis system and their synthetic errors are labeled by manually annotating the segments that sound unnatural. Then, two context-dependent acoustic models are trained using the natural and unnatural segments of labeled synthetic speech respectively. The log likelihood ratio of acoustic features between these two models is adopted to train the SVM classifier for error detection. Experimental results show the proposed method is effective in detecting the errors of pitch contour within a word for a Mandarin speech synthesis system. The proposed SVM method using log likelihood ratio between context-dependent acoustic models outperforms the SVM classifier trained on acoustic features directly.
Bibliographic reference. Lu, Heng / Ling, Zhen-Hua / Wei, Si / Dai, Lirong / Wang, Ren-Hua (2010): "Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier", In INTERSPEECH-2010, 162-165.