Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Multi-Resolution Front-End for Noise Robust Speech Recognition

Ramalingam Hariharan, Imre Kiss, Olli Viikki, Jilei Tian

Speech and Audio Systems Laboratory, Nokia Research Center, Tampere, Finland

This paper proposes a new feature extraction approach for noise robust speech recognition. The recent work in multi-band and missing feature theory based Automatic Speech Recognition (ASR) has shown that sub-band processing of speech has certain advantages over the conventional full-band technique. In multiband ASR, different frequency sub-bands are usually decoded independently and a final recognition result is obtained by combining different frequency channels at some temporal level. Since it is not straightforward to determine the optimal combination level, we propose that different sub-band parameters need to be collected into a single feature vector for decoding. As the full-band parameters still carry important information for classification, we suggest that full-band features need to be included in the final feature vector. Our third observation is that the use of PCA transform for de-correlating log-spectral features provides better recognition performance than DCT. The experimental results show that the proposed front-end provides 36.2% improvement in performance over the conventional full-band technique.


Full Paper

Bibliographic reference.  Hariharan, Ramalingam / Kiss, Imre / Viikki, Olli / Tian, Jilei (2000): "Multi-resolution front-end for noise robust speech recognition", In ICSLP-2000, vol.3, 550-553.