EUROSPEECH 2003 - INTERSPEECH 2003
The time derivatives of speech energy, such as the delta and the delta-delta log energy, have been known as critical features for automatic speech recognition (ASR). However, their discriminative ability in lower signal-to-noise ratio (SNR) could be limited or even becomes harmful because of the corruption of energy contour. By taking the advantage of the spectral characteristic of in-car noise, the speech energy contour is extracted from the high-pass filtered signal so as to reduce the distortion in the delta energy. Such filtering can be implemented by using a pre-emphasis-like filter or a summation of higher frequency band energies. A Chinese name recognition task is conducted to evaluate the proposed method by using real in-car speech and artificially generated one as the test data. As shown in the experimental results, the method is capable of improving the recognition accuracy of in-car speech in lower SNR as well as of the clean speech.
Bibliographic reference. Hwang, Tai-Hwei (2003): "Energy contour extraction for in-car speech recognition", In EUROSPEECH-2003, 2181-2184.