10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Integrating Codebook and Utterance Information in Cepstral Statistics Normalization Techniques for Robust Speech Recognition

Guan-min He, Jeih-weih Hung

National Chi Nan University, Taiwan

Cepstral statistics normalization techniques have been shown to be very successful at improving the noise robustness of speech features. This paper proposes a hybrid-based scheme to achieve a more accurate estimate of the statistical information of features in these techniques. By properly integrating codebook and utterance knowledge, the resulting hybrid-based approach significantly outperforms conventional utterance-based, segmentbased and codebook-based approaches in noisy environments. For the Aurora-2 clean-condition training task, the proposed hybrid codebook/segment-based histogram equalization (CS-HEQ) achieves an average recognition accuracy of 90.66%, which is better than utterance-based HEQ (87.62%), segment-based HEQ (85.92%) and codebook-based HEQ (85.29%). Furthermore, the high-performance CS-HEQ can be implemented with a short delay and can thus be applied in real-time online systems.

Full Paper

Bibliographic reference.  He, Guan-min / Hung, Jeih-weih (2009): "Integrating codebook and utterance information in cepstral statistics normalization techniques for robust speech recognition", In INTERSPEECH-2009, 1239-1242.