Features that model temporal aspects of phonemes are important in speech recognition. One method is to use linear discriminant analysis (LDA) to find discriminative features from a spectro-temporal input formed by concatenating consecutive frames of short-time spectrum features. Others use e.g. neural networks to process longer span spectral segments to improve recognition accuracy. Still the most widely used method for including temporal cues is to augment the short-time spectral features with simple time derivatives. In this paper a new feature estimation method based on pairwise linear discriminants is presented. We compare it and some of its variants to traditional MFCC features and to LDA estimated features in a large vocabulary continuous speech recognition (LVCSR) task. The features obtained with the new estimation method show significant improvements in recognition accuracy over MFCC and LDA features.
Cite as: Pylkkönen, J. (2006) LDA based feature estimation methods for LVCSR. Proc. Interspeech 2006, paper 1213-Mon2BuP.12, doi: 10.21437/Interspeech.2006-129
@inproceedings{pylkkonen06_interspeech, author={Janne Pylkkönen}, title={{LDA based feature estimation methods for LVCSR}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1213-Mon2BuP.12}, doi={10.21437/Interspeech.2006-129} }