ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

N-best based stochastic mapping on stereo HMM for noise robust speech recognition

Xiaodong Cui, Mohamed Afify, Yuqing Gao

In this paper we present an extension of our previously proposed feature space stereo-based stochastic mapping (SSM). As distinct from an auxiliary stereo Gaussian mixture model in the front-end in our previous work, a stereo HMM model in the back-end is used. The basic idea, as in feature space SSM, is to form a joint space of the clean and noisy features, but to train a Gaussian mixture HMM in the new space. The MMSE estimation, which is the conditional expectation of the clean speech given the sequence of noisy observations, leads to clean speech predictors at the granularity of the Gaussian distributions in the HMM model. Because the Gaussians are not known during decoding, N-best hypotheses are employed. This results in a clean speech predictor which is a weighted (by posteriors) sum of the estimates from different Gaussian distributions. In experimental evaluation of the proposed method on the Aurora 2 database it gives better performance over the MST model, particularly, about 10%.20% relative improvement under unseen noise conditions.


doi: 10.21437/Interspeech.2008-303

Cite as: Cui, X., Afify, M., Gao, Y. (2008) N-best based stochastic mapping on stereo HMM for noise robust speech recognition. Proc. Interspeech 2008, 1261-1264, doi: 10.21437/Interspeech.2008-303

@inproceedings{cui08_interspeech,
  author={Xiaodong Cui and Mohamed Afify and Yuqing Gao},
  title={{N-best based stochastic mapping on stereo HMM for noise robust speech recognition}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1261--1264},
  doi={10.21437/Interspeech.2008-303}
}