Previous work has shown the ability of Artificial Neural Networks (ANN), and Multilayer Perceptrons (MLPs) in particular, to estimate a posteriori probabilities that can be used, after division by the a priori probabilities of the classes, as emission probabilities for Hidden Markov Models (HMMs). The advantages of a speech recognition system incorporating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. While this approach has been shown useful for speech recognition, it is still important to understand the underlying problems and limitations and to consider its consequences on other algorithms. For example, while state of the art HMM-based speech recognizers now model context-dependent phonetic units such as triphones instead of phonemes to improve their performance, most of the MLP-based approaches are restricted to phoneme models. After a short review, it is shown here how such neural network approaches can be generalized to context-dependent phoneme models. Also, it is discussed how previous theoretical results can affect the development of other algorithms like nonlinear Autoregressive (AR) Models and Radial Basis Functions (RBFs).
Cite as: Bourlard, H. (1991) Neural nets and hidden Markov models: review and generalizations. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 363-369, doi: 10.21437/Eurospeech.1991-96
@inproceedings{bourlard91_eurospeech, author={Herve Bourlard}, title={{Neural nets and hidden Markov models: review and generalizations}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={363--369}, doi={10.21437/Eurospeech.1991-96} }