One popular feature type in speech recognition is based on linear transformations of sequences of cepstral feature vectors. In general the transformation is generated in two steps: first a transformation like linear discriminant analysis (LDA) or heteroscedastic linear discriminant analysis (HLDA) is used to maximize separation between classes and reduce the dimensionality, followed by a decorrelating transformation. Here we investigate the weighting of classes when using the LDA transformation. In particular we are concerned with the special status of silence, for which the data can be arbitrarily long, and which can be represented by more than one silence/noise model. The special case of our acoustic models for commercial applications, which consist of several sub-models for each type of application, like general English, digits, names, alphabet, etc., creates a conflict when using a transformation like LDA to improve the separability of states which correspond to the same phoneme, but used within a different type of task. We also evaluate replacing sample counts with error/accuracy counts and cross-task LDA transformation estimation. The results show that it is important to take these conditions into account and demonstrate accuracy/speed improvements when appropriate care is taken in computing the LDA transformations.
Cite as: Ljolje, A. (2006) Optimization of class weights for LDA feature transformations. Proc. Interspeech 2006, paper 2031-Mon2BuP.11, doi: 10.21437/Interspeech.2006-128
@inproceedings{ljolje06_interspeech, author={Andrej Ljolje}, title={{Optimization of class weights for LDA feature transformations}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 2031-Mon2BuP.11}, doi={10.21437/Interspeech.2006-128} }