5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

On Variable Sampling Frequencies in Speech Recognition

Fu-Hua Liu, Michael Picheny

IBM Watson Research Center, USA

In this paper we describe a novel approach to address the issue of different sampling frequencies in speech recognition. When a recognition task needs a different sampling frequency from that of the reference system, it is customary to re-train the system for the new sampling rate. To circumvent the tedious training process, we propose a new approach termed Sampling Rate Transformation (SRT) to perform the transformation directly on speech recognition system. By re-scaling the mel-filter design and filtering the system in spectrum domain, SRT converts the existing system to the target spectral range. New systems are obtained without using any data from the test environment. SRT reduces the word error rate from 29.89% to 18.17% given 11KHz test data and a 16KHz SI system. The matched system for 11KHz has an error rate of 16.17%. We also examine MLLR and MAP. The best result from MLLR is 17.92% with 4.5 hours of speech. Similar improvements are also observed in the speaker adaptation mode.

Full Paper

Bibliographic reference.  Liu, Fu-Hua / Picheny, Michael (1998): "On variable sampling frequencies in speech recognition", In ICSLP-1998, paper 0838.