EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Analysis of the Root-Cepstrum for Acoustic Modeling and Fast Decoding in Speech Recognition

Ruhi Sarikaya, John H. L. Hansen

RSPL-CSLR, University of Colorado-Boulder, USA

Root-cepstral analysis has been proposed previously for speech recognition in car environments. In this paper, we focus on an alternative aspect of Root-cepstrum as it applies to discriminative acoustic modeling and fast speech recognizer decoding. We compare Rootcepstrum to Mel-Frequency cepstrum Coefficients (MFCC) in terms of their noise immunity during modeling and decoding speed. Our experiments use the SPINE~cite{HAN00} corpus which is composed of clean and noisy data with a 5K vocabulary size. Experiments were performed that allow pair-wise comparisons of acoustic models across different feature sets and acoustic units. We observed that for 84% of the phonemes, the average distance to all other acoustic units is increased in the Root-cepstrum domain compared to MFCC resulting in a sharp acoustic model set. Therefore, the ambiguity in the Root-cepstrum space is reduced. Large vocabulary noisy speech recognition experiments showed a 27.5% reduction in real--time processing factor (RTF) compared to MFCC features while improving overall recognition accuracy.

