Sixth International Conference on Spoken Language Processing
This paper proposes a new feature extraction approach for noise robust speech recognition. The recent work in multi-band and missing feature theory based Automatic Speech Recognition (ASR) has shown that sub-band processing of speech has certain advantages over the conventional full-band technique. In multiband ASR, different frequency sub-bands are usually decoded independently and a final recognition result is obtained by combining different frequency channels at some temporal level. Since it is not straightforward to determine the optimal combination level, we propose that different sub-band parameters need to be collected into a single feature vector for decoding. As the full-band parameters still carry important information for classification, we suggest that full-band features need to be included in the final feature vector. Our third observation is that the use of PCA transform for de-correlating log-spectral features provides better recognition performance than DCT. The experimental results show that the proposed front-end provides 36.2% improvement in performance over the conventional full-band technique.
Bibliographic reference. Hariharan, Ramalingam / Kiss, Imre / Viikki, Olli / Tian, Jilei (2000): "Multi-resolution front-end for noise robust speech recognition", In ICSLP-2000, vol.3, 550-553.