12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Improvements of a Dual-Input DBN for Noise Robust ASR

Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves

Radboud Universiteit Nijmegen, The Netherlands

In previous work we have shown that an ASR system consisting of a dual-input Dynamic Bayesian Network (DBN) which simultaneously observes MFCC acoustic features and an exemplar-based Sparse Classification (SC) phoneme predictor stream can achieve better word recognition accuracies in noise than a system that observes only one input stream. This paper explores three modifications of SC input to further improve the noise robustness of the dual-input DBN system: 1) using state likelihoods instead of phonemes, 2) integrating more contextual information and 3) using a complete set of likelihood distribution. Experiments on AURORA-2 reveal that the combination of the first two approaches significantly improves the recognition results, achieving up to 29% (absolute) accuracy gain at SNR -5 dB. In the dual-input system using the full likelihood vector does not outperform using the best state prediction.

Full Paper

Bibliographic reference.  Sun, Yang / Gemmeke, Jort F. / Cranen, Bert / Bosch, Louis ten / Boves, Lou (2011): "Improvements of a dual-input DBN for noise robust ASR", In INTERSPEECH-2011, 1669-1672.