ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

September 18-20, 2000
Paris, France

Tied-Posteriors: A New Hybrid Speech Recognition Technology with Generic Capabilities and High Portability

Jan Stadermann, Jörg Rottland, and Gerhard Rigoll

Faculty of Electrical Engineering, Gerhard-Mercator-University Duisburg, Germany

This paper presents a new method for estimating the emission probabilities of general hybrid connectionist/HMM recognition systems. Contrary to the traditional hybrid approach, where a neural network is used for providing posterior probabilities in order to model the emission probabilities of one-state HMMs, our new tied-posterior approach uses the posterior probabilities resulting from the neural net output in order to replace the Gaussian components of a standard tied-mixture system. This approach allows to use an arbitrary HMM topology with all context-dependency and all clustering techniques used in tied-mixture systems. As will be demonstrated in more detail in the paper, this speech recognition architecture can be ideally used as generic technology, because it enables the usage of simple straightforward techniques, mainly consisting of training standard neural nets with error-backpropagation and using standard ML techniques for estimating the tied-posterior mixture weights. These simple to deploy components lead to a system yielding very good results even for context-independent models and is superior to the traditional (also easily portable) hybrid posterior approach. Experiments evaluated on theWall Street Journal (WSJ) database have shown a significant improvement of the recognition rate compared to standard hybrid connectionist/HMM speech recognition systems on this task.


Full Paper (PDF)   Full Paper (Zipped Postscript)

Bibliographic reference.  Stadermann, Jan / Rottland, Jörg / Rigoll, Gerhard (2000): "Tied-Posteriors: A new hybrid speech recognition technology with generic capabilities and high portability", In ASR-2000, 24-28.