5th International Conference on Spoken Language Processing
In this paper we describe a novel approach to address the issue of different sampling frequencies in speech recognition. When a recognition task needs a different sampling frequency from that of the reference system, it is customary to re-train the system for the new sampling rate. To circumvent the tedious training process, we propose a new approach termed Sampling Rate Transformation (SRT) to perform the transformation directly on speech recognition system. By re-scaling the mel-filter design and filtering the system in spectrum domain, SRT converts the existing system to the target spectral range. New systems are obtained without using any data from the test environment. SRT reduces the word error rate from 29.89% to 18.17% given 11KHz test data and a 16KHz SI system. The matched system for 11KHz has an error rate of 16.17%. We also examine MLLR and MAP. The best result from MLLR is 17.92% with 4.5 hours of speech. Similar improvements are also observed in the speaker adaptation mode.
Bibliographic reference. Liu, Fu-Hua / Picheny, Michael (1998): "On variable sampling frequencies in speech recognition", In ICSLP-1998, paper 0838.