![]() |
ITRW on
|
![]() |
In voice-activated teleservices, two types of speech recognition systems are commonly used, (tri)phone-based and wholeword model recognizers. While the first type of systems exhibits a convenient implementation to recognize any new vocabulary word, the second achieves a higher performance level when the necessary and specific training data is available. In order to bridge the performance gap between these two systems, this paper describes a new adaptation method based on a two-tier speech model. More specifically, the first tier of the model architecture performs the phone-based recognition, while the second tier implements the adaptation to the whole-word models. Experimental results for connected word recognition are presented in two different cases, (i) for a hybrid NN/HMM recognition system, and (ii) for the new two-tier hybrid system that is implemented through a Multirate Neural Network (MNN) front-end. According to the results obtained within the described experimental settings, the adaptation method reduces the error rate by more than 80%. Furthermore, the new system is an example of modular spatiotemporal modeling of speech.
Bibliographic reference. Kommer, Robert van / Hirsbrunner, Beat (2001): "Word model adaptation in voice-activated teleservices", In Adaptation-2001, 101-104.