This paper proposes a new feature extraction approach for noise robust speech recognition. The recent work in multi-band and missing feature theory based Automatic Speech Recognition (ASR) has shown that sub-band processing of speech has certain advantages over the conventional full-band technique. In multiband ASR, different frequency sub-bands are usually decoded independently and a final recognition result is obtained by combining different frequency channels at some temporal level. Since it is not straightforward to determine the optimal combination level, we propose that different sub-band parameters need to be collected into a single feature vector for decoding. As the full-band parameters still carry important information for classification, we suggest that full-band features need to be included in the final feature vector. Our third observation is that the use of PCA transform for de-correlating log-spectral features provides better recognition performance than DCT. The experimental results show that the proposed front-end provides 36.2% improvement in performance over the conventional full-band technique.
Cite as: Hariharan, R., Kiss, I., Viikki, O., Tian, J. (2000) Multi-resolution front-end for noise robust speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 550-553, doi: 10.21437/ICSLP.2000-594
@inproceedings{hariharan00b_icslp, author={Ramalingam Hariharan and Imre Kiss and Olli Viikki and Jilei Tian}, title={{Multi-resolution front-end for noise robust speech recognition}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 550-553}, doi={10.21437/ICSLP.2000-594} }