ITRW on
Adaptation Methods for Speech Recognition

August 29-30, 2001
Sophia Antipolis, France

Word Model Adaptation in Voice-Activated Teleservices

Robert van Kommer (1) and Beat Hirsbrunner (2)

(1) Swisscom AG, SGS-CT-MMS, Bern, Switzerland
(2) University of Fribourg, Switzerland

In voice-activated teleservices, two types of speech recognition systems are commonly used, (tri)phone-based and wholeword model recognizers. While the first type of systems exhibits a convenient implementation to recognize any new vocabulary word, the second achieves a higher performance level when the necessary and specific training data is available. In order to bridge the performance gap between these two systems, this paper describes a new adaptation method based on a two-tier speech model. More specifically, the first tier of the model architecture performs the phone-based recognition, while the second tier implements the adaptation to the whole-word models. Experimental results for connected word recognition are presented in two different cases, (i) for a hybrid NN/HMM recognition system, and (ii) for the new two-tier hybrid system that is implemented through a Multirate Neural Network (MNN) front-end. According to the results obtained within the described experimental settings, the adaptation method reduces the error rate by more than 80%. Furthermore, the new system is an example of modular spatiotemporal modeling of speech.

Full Paper

Bibliographic reference.  Kommer, Robert van / Hirsbrunner, Beat (2001): "Word model adaptation in voice-activated teleservices", In Adaptation-2001, 101-104.