12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Incorporating Regional Information to Enhance MAP-Based Stochastic Feature Compensation for Robust Speech Recognition

Yu Tsao, Paul R. Dixon, Chiori Hori, Hisashi Kawai

NICT, Japan

In this study, we propose an environment structuring framework to facilitate suitable prior density preparation for MAP-based stochastic feature matching (SFM) for robust speech recognition. We use a two-stage hierarchical structure to construct the environment structuring framework to characterize the regional information of various speaker and speaking environments. With the regional information, we derive three types of prior densities, namely clustered prior, sequential prior, and hierarchical prior densities. We also designed an integrated prior density to combine the advantages of the above three prior densities. From our experimental results on the Aurora-2 task, we confirmed that with regional information, we can obtain more suitable prior densities and thus enhance the performance of MAP-based SFM. Moreover, we found that by using the integrated prior density, which integrates multiple knowledge sources from the other three, MAP-based SFM gives the best performance.

Full Paper

Bibliographic reference.  Tsao, Yu / Dixon, Paul R. / Hori, Chiori / Kawai, Hisashi (2011): "Incorporating regional information to enhance MAP-based stochastic feature compensation for robust speech recognition", In INTERSPEECH-2011, 2585-2588.