Background noise can have a significant impact on the performance of speech recognition systems. A range of fast feature-space and model-based schemes have been investigated to increase robustness. Model-based approaches typically achieve lower error rates, but at an increased computational load compared to feature-based approaches. This makes their use in many situations impractical. The uncertainty decoding framework can be considered an elegant compromise between the two. Here, the uncertainty of features is propagated to the recogniser in a mathematically consistent fashion. The complexity of the model used to determine the uncertainty may be decoupled from the recognition model itself, allowing flexibility in the computational load. This paper describes a new approach within this framework, Joint uncertainty decoding. This approach is compared with the uncertainty decoding version of SPLICE, standard SPLICE, and a new form of front-end CMLLR. These are evaluated on a medium vocabulary speech recognition task with artificially added noise.
Cite as: Liao, H., Gales, M.J.F. (2005) Joint uncertainty decoding for noise robust speech recognition. Proc. Interspeech 2005, 3129-3132, doi: 10.21437/Interspeech.2005-265
@inproceedings{liao05_interspeech, author={H. Liao and M. J. F. Gales}, title={{Joint uncertainty decoding for noise robust speech recognition}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3129--3132}, doi={10.21437/Interspeech.2005-265} }