Non-Linear Speech Processing (NOLISP 03)

May 20-23, 2003
Le Croisic, France

Some Experiments on Speaker-Independent Isolated Digit Recognition using SVM classifiers

Ramón Fernández-Lorenzana, Fernando Pérez-Cruz, José Miguel García-Cabellos, Carmen Peláez-Moreno, Ascensión Gallardo-Antolín, Fernando Díaz-de-María

Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Spain

Introduction (abridged). Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). During the last decades, the research in HMMs for ASR have brought about significant advances and, consequently, the HMMs are currently accurately tuned for this application. Nevertheless, we are still far from achieving high-performance speech recognition-based interfaces. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the last decade. Some of them faced the ASR problem using predictive ANNs while others proposed hybrid (HMM-ANN) approaches. Nowadays, however, the preponderance of HMMs is a fact.

Speech recognition is essentially a problem of pattern classification, but the high dimensionality of the sequences of speech feature vectors has prevented researchers to propose a straightforward classification scheme for ASR. Support Vector Machines (SVMs) are state-of-the-art tools for linear and nonlinear knowledge discovery. Being based on the maximum margin classifier, which can be regarded as the common sense solution, the SVM is able to outperform classical classifiers in the presence of high dimensional data even when working with nonlinear machines. The SVM “philosophy” basically states that the only available information for constructing the classifier are the training samples. Therefore, in those applications in which a priori knowledge or structure is known, the SVM might not be as powerful as other machine learning techniques which can benefit form this information. Some work has been done in this direction, but still there are open issues that need to be addressed. Some researchers have already proposed different approaches to speech recognition aiming at taking advantage of this type of classifiers.

In this paper we propose to use SVMs for speaker-independent isolated digit recognition by plain classification. For this purpose, we use an standard MFCC parameterization that has been time-adapted to the fixed-input dimension required by SVMs.

Full Paper

Bibliographic reference.  Fernández-Lorenzana, Ramón / Pérez-Cruz, Fernando / García-Cabellos, José Miguel / Peláez-Moreno, Carmen / Gallardo-Antolín, Ascensión / Díaz-de-María, Fernando (2003): "Some experiments on speaker-independent isolated digit recognition using SVM classifiers", In NOLISP-2003, paper 028.