EUROSPEECH '97
5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997


Acoustic Front-End Optimization for Large Vocabulary Speech Recognition

Lutz Welling, N. Haberland, Hermann Ney

Lehrstuhl für Informatik VI, RWTH Aachen, Aachen, Germany

In this paper we describe experiments with the acoustic front{end of our large vocabulary speech recognition system. In particular, two aspects are studied: 1) linear transforms for feature extraction and 2) the modelling of the emission probabilities. Experiments are reported on a 5000 - word task of the ARPA Wall Street Journal database. For the linear transforms our main results are: a) Filter{bank coefficients yield a word error rate of 9.3%. b) A cepstral decorrelation reduces the error rate from 9.3% to 8.0%. c) By applying a linear discriminant analysis (LDA) a further reduction in the error rate from 8.0% to 7.1% is obtained. d) Recognition results are similar for a LDA applied to filter{bank outputs and to cepstral coefficients. The experiments with density modelling gave the following results: a) Gaussian and Laplacian densities yield similar error rates. b) One single vector of variances or absolute deviations outperforms density-specific or mixture- specific vectors.

Full Paper

Bibliographic reference.  Welling, Lutz / Haberland, N. / Ney, Hermann (1997): "Acoustic front-end optimization for large vocabulary speech recognition", In EUROSPEECH-1997, 2099-2102.