ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Regularizing linear discriminant analysis for speech recognition

Hakan Erdogan

Feature extraction is an essential first step in speech recognition applications. In addition to static features extracted from each frame of speech data, it is beneficial to use dynamic features (called Δ and ΔΔ coefficients) that use information from neighboring frames. Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transform (MLLT) applied to spliced static MFCC features yields important performance gains as compared to MFCC+Δ+ΔΔ features in most tasks. However, since LDA is obtained using statistical averages trained on limited data, it is reasonable to regularize LDA transform computation by using prior information and experience. In this paper, we regularize LDA and heteroschedastic LDA transforms using two methods: (1) Using statistical priors for the transform in a MAP formulation (2) Using structural constraints on the transform. As prior, we use a transform that computes static+Δ+ΔΔ coefficients. Our structural constraint is in the form of a block structured LDA transform where each block acts on the same cepstral parameters across frames. The second approach suggests using new coefficients for static, first difference and second difference operators as compared to the standard ones to improve performance. We test the new algorithms on two different tasks, namely TIMIT phone recognition and AURORA2 digit sequence recognition in noise. We obtain consistent improvement in our experiments as compared to MFCC features. In addition, we obtain encouraging results in some AURORA2 tests as compared to LDA+MLLT features.

doi: 10.21437/Interspeech.2005-144

Cite as: Erdogan, H. (2005) Regularizing linear discriminant analysis for speech recognition. Proc. Interspeech 2005, 3021-3024, doi: 10.21437/Interspeech.2005-144

  author={Hakan Erdogan},
  title={{Regularizing linear discriminant analysis for speech recognition}},
  booktitle={Proc. Interspeech 2005},