Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of frequencies are dominated by noise has inspired the "missing data" and "multi-band" approaches to noise robust ASR. "Missing data" ASR identifies low SNR spectral data in each data frame and then ignores it. Multi-band ASR trains a separate model for each position of missing data, estimates a reliability weight for each model, then combines model outputs in a weighted sum. A problem with both approaches is that local data reliability estimation is inherently inaccurate and also assumes that all of the training data was clean. In this article we present a model in which adaptive multi-band expert weighting is incorporated naturally into the maximum a posteriori (MAP) decoding process.
Cite as: Morris, A., Hagen, A., Bourlard, H. (2001) MAP combination of multi-stream HMM or HMM/ANN experts. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 225-228, doi: 10.21437/Eurospeech.2001-79
@inproceedings{morris01_eurospeech, author={Andrew Morris and Astrid Hagen and Hervé Bourlard}, title={{MAP combination of multi-stream HMM or HMM/ANN experts}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={225--228}, doi={10.21437/Eurospeech.2001-79} }