11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Automatic Detection of Abnormal Stress Patterns in Unit Selection Synthesis

Yeon-Jun Kim, Mark C. Beutnagel

AT&T Labs Research, USA

This paper introduces a method to detect lexical stress errors in unit selection synthesis automatically using machine learning algorithms. If unintended stress patterns can be detected following unit selection, based on features available in the unit database, it may be possible to modify the units during waveform synthesis to correct errors and produce an acceptable stress pattern. In this paper, three machine learning algorithms were trained with acoustic measurements from natural utterances and corresponding stress patterns: CART, SVM and MaxEnt. Our experimental results showed that MaxEnt performs the best (83.3% for 3-syllable words, 88.7% for 4-syllable words correctly classified) in the natural stress pattern classification. Though classification rates are good, a large number of false alarms are produced. However, there is some indication that signal modifications based on false positives do little harm to the speech output.

Full Paper

Bibliographic reference.  Kim, Yeon-Jun / Beutnagel, Mark C. (2010): "Automatic detection of abnormal stress patterns in unit selection synthesis", In INTERSPEECH-2010, 170-173.