EUROSPEECH 2003 - INTERSPEECH 2003
Feature representation is a very important factor that has great effect on the performance of speech recognition systems. In this paper we focus on a feature generation process that is based on linear transformation of the original log-spectral representation. We first discuss several three popular linear transformation methods, Mel-Frequency Cepstral Coefficients (MFCC), Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA). We then propose a new method of linear transformation that maximizes the normalized acoustic likelihood of the most likely state sequences of training data, a measure that directly related to our ultimate objective of reducing Bayesian classification error rate in speech recognition. Experimental results show that the proposed method decreases the relative word error rate by more than 8.8% compared to the best implementation of LDA, and by more than 25.9% compared to MFCC features.
Bibliographic reference. Li, Xiang / Stern, Richard M. (2003): "Feature generation based on maximum classification probability for improved speech recognition", In EUROSPEECH-2003, 845-848.