7th International Conference on Spoken Language Processing
September 16-20, 2002
In spoken dialogue systems, hyperarticulation occur as an effect to recover previous recognition errors. It is commonly observed that users of automatic speech recognition systems apply similar recovery strategies as in human-human interactions. Previous studies have shown that current speech recognizers donít cover hyperarticulated speech well. As an effect of higher word error rates at hyperarticulated speech, humans try to reinforce this speaking style which results in even more recognition errors. In this study, we investigate the use of articulatory features to compensate hyperarticulated effects. The underlying idea is, that acoustic models for articulatory features are more robust against variations in the speaking style compared to pure phone models. We present a streaming architecture which integrates articulatory features in a standard HMM based system. Using this approach, we achieved an error reduction of 25.1% for hyperarticulated speech and even 8.9% for normal speech without any use of hyperarticulated training data.
Bibliographic reference. Soltau, Hagen / Metze, Florian / Waibel, Alex (2002): "Compensating for hyperarticulation by modeling articulatory properties", In ICSLP-2002, 841-844.