While the temporal dynamic of speech can be represented very efficiently by Hidden Markov Models (HMMs) the classification of the single speech units (phonemes) is usually done non-optimally with gaussian probability distribution functions, which are not discriminative. In this paper we use the Kernel Fisher Discriminant (KFD) for classification by integrating this method in a HMM-based speech recognition system. In this hybrid structure we translate the outputs of the KFD-classifier into conditional probabilities and use them as production probabilities of a HMM-based decoder for speech recognition. The KFD has already shown good classification results in other fields (e.g. pattern recognition). To obtain a good performance also in terms of computational complexity the KFD is implemented iteratively with a sparse greedy approach, i.e. the sparseness of the vector we are looking for in the feature space is reduced in each iteration step until a stopping criterion is reached. We train and test the described hybrid structure on a subset of the Wall Street Journal (WSJ). A HMM-based decoder with Gaussian mixture models (GMMs) as production probabilities is used for baseline results. Modest improvements have been achieved so far.
Cite as: Andelic, E., Schafföner, M., Krüger, S.E., Katz, M., Wendemuth, A. (2004) Iterative implementation of the kernel Fisher discriminant for speech recognition. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 99-103
@inproceedings{andelic04_specom, author={E. Andelic and M. Schafföner and S. E. Krüger and M. Katz and Andreas Wendemuth}, title={{Iterative implementation of the kernel Fisher discriminant for speech recognition}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={99--103} }