In this article we take a step forward towards the application of Support Vector Machines (SVMs) to continuous speech recognition. As in previous work, we use SVMs to estimate emission probabilities in the context of an SVM/HMM system. However, training pairwise classifiers to discriminate between some of the HMM-states of very close phonetic classes produce unsatisfactory results. We propose a data-driven approach for selecting the HMM-states for which SVMs are trained and those ones that are implicitly tied.
Additionally we introduce an algorithm that is incorporated into the decoder for dynamically selecting the subset of SVMs used to estimate the emission probabilities. This algorithm reduces the number of SVMs evaluated at the frame level dramatically while preserving recognition accuracy. We present results in a very challenging corpora composed of children's speech. Our approach not only outperforms comparable GMM/HMM based systems but other SVM/HMM systems proposed to date.
Bibliographic reference. Bolaños, Daniel / Ward, Wayne (2008): "Implicit state-tying for support vector machines based speech recognition", In INTERSPEECH-2008, 924-927.