Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Streamlining the Front End of a Speech Recognizer

Hua Yu, Alex Waibel

Interactive Systems Lab, Carnegie Mellon University, Pittsburgh, PA, USA

In this paper we seek to streamline various operations within the front end of a speech recognizer, both to reduce unnecessary computation and to simplify the conceptual framework. First, a novel view of the front end in terms of linear transformations is presented. Then we study the invariance property of recognition performance with respect to linear transformations (LT) at the front end. Analysis reveals that several LT steps can be consolidated into a single LT, which effectively eliminates the Discrete Cosine Transform (DCT) step, part of the traditional MFCC (Mel-Frequency Cepstral Coefficient) front end. Moreover, a highly simplified, data-driven front-end scheme is proposed as a direct generalization of this idea. The new setup has no Mel-scale filtering, another part of the MFCC front end. Experimental results show a 5% relative improvement on the Broadcast News task.


Full Paper

Bibliographic reference.  Yu, Hua / Waibel, Alex (2000): "Streamlining the front end of a speech recognizer", In ICSLP-2000, vol.1, 353-356.