8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Extraction Methods of Voicing Feature for Robust Speech Recognition

Andras Zolnay, Ralf Schluter, Hermann Ney

RWTH Aachen, Germany

In this paper, three different voicing features are studied as additional acoustic features for continuous speech recognition. The harmonic product spectrum based feature is extracted in frequency domain while the autocorrelation and the average magnitude difference based methods work in time domain. The algorithms produce a measure of voicing for each time frame. The voicing measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis to choose the most relevant features. Experiments have been performed on small and large vocabulary tasks. The three different voicing measures combined with MFCCs resulted in similar improvements in word error rate: improvements of up to 14% on the small-vocabulary task and improvements of up to 6% on the large-vocabulary task relative to using MFCC alone with the same overall number of parameters in the system.

Full Paper

Bibliographic reference.  Zolnay, Andras / Schluter, Ralf / Ney, Hermann (2003): "Extraction methods of voicing feature for robust speech recognition", In EUROSPEECH-2003, 497-500.