Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Using Support Vector Machines for Spoken Digit Recognition

Issam Bazzi, Dina Katabi

MIT Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

Recently Support Vector Machines (SVM) have emerged as a pattern classifier that can deal with a large feature space and a small data set. In this paper, we look at using SVMs for automatic speech recognition. We focus on a fairly simple speech recognition task: recognizing isolated spoken digits in English. The approach relies solely on the acoustic signal and does not utilize any phonological rules or language constraints in the recognition process. For each digit, a 420-dimension feature vector is extracted. The feature vector is derived from the Mel-Frequency cepstral coefficients of the speech signal. The 420 features are then reduced to a smaller number using principal component analysis (PCA). To perform N-way classification for the ten digits using the standard 2-class SVM classifiers, we examine scoring and voting classification schemes. The best performance is obtained with an N-way 1-versus-9 SVM classifier with a Gaussian kernel of a variance of 4 and using the first 45 PCA features. The accuracy of this classifier is 94.9%.

Full Paper

Bibliographic reference.  Bazzi, Issam / Katabi, Dina (2000): "Using support vector machines for spoken digit recognition", In ICSLP-2000, vol.1, 433-436.