Sixth International Conference on Spoken Language Processing
In spoken dialogue systems, hyperarticulation occur as an effect to recover previous recognition errors. It is commonly observed that in particular real users apply similar recovery strategies as in human-human interactions. Previous studies have shown that current speech recognizer cannot handle hyperarticulated speech. As an effect of higher word error rates at hyperarticulated speech, humans try to reinforce this speaking style which result in even more recognition errors. In this paper, we present approaches to build robust acoustic models for hyperarticulated speech. The key point is that the changes of acoustic features at hyperarticulation is a phone dependent effect. The idea is to use the likelihood criterion to decide, which phones should be treated separately. This can be done by incorporating dynamic questions about hyperarticulation into the clustering stage. Based on such phonetic decision tree, we can generate appropriate acoustic models. With this method, we achieved a word error reduction about 9% relative at hyperarticulation.
Bibliographic reference. Soltau, Hagen / Waibel, Alex (2000): "Phone dependent modeling of hyperarticulated effects#", In ICSLP-2000, vol.4, 105-108.