Earlier work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for a Hidden Markov Model (HMM) [1][2][3]. In these reports, we have shown that these estimates have led to improved performance over counting estimation techniques in the case where a fairly simple HMM was used. However, current state-of-the-art continuous speech recognizers require HMMs with greater complexity, e. g. multiple densities per phone and/or context-dependent phone models. Brute-force application of our earlier techniques to triphones (the standard approach to context-dependent HMMs) would result in an output layer with many thousands of units, and many millions of connections to train. In this report we describe another approach to the application of MLPs to context-dependent probability density estimation, as well as some practical aspects of efficient implementation of the method.
Cite as: Morgan, N., Bourlard, H., Wooters, C., Kohn, P., Cohen, M. (1991) Phonetic context in hybrid HMM/MLP continuous speech recognition. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 109-112, doi: 10.21437/Eurospeech.1991-23
@inproceedings{morgan91_eurospeech, author={Nelson Morgan and Hervé Bourlard and C. Wooters and Phil Kohn and M. Cohen}, title={{Phonetic context in hybrid HMM/MLP continuous speech recognition}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={109--112}, doi={10.21437/Eurospeech.1991-23} }