INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR

Suman V. Ravuri

ICSI, USA

Tandem systems based on multi-layer perceptrons (MLPs) have improved the performance of automatic speech recognition systems on both large vocabulary and noisy tasks. One potential problem of the standard Tandem approach, however, is that the MLPs generally used do not model temporal dynamics inherent in speech. In this work, we propose a hybrid MLP/Structured-SVM model, in which the parameters between the hidden layer and output layer and temporal transitions between output layers are modeled by a Structured-SVM. A Structured-SVM can be thought of as an extension to the classical binary support vector machine which can naturally classify “structures” such as sequences. Using this approach, we can identify sequences of phones in an utterance.
   We try this model on two different corpora — Aurora2 and the large-vocabulary section of the ICSI meeting corpus — to investigate the model's performance in noisy conditions and on a large-vocabulary task. Compared to a difficult Tandem baseline in which the MLP is trained using 2nd-order optimization methods, the MLP/Structured-SVM system decreases WER in noisy conditions by 7.9% relative. On the large vocabulary corpus, the proposed system decreasesWER by 1.1% absolute compared to the 2nd-order Tandem system.

Full Paper

Bibliographic reference.  Ravuri, Suman V. (2014): "Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASR", In INTERSPEECH-2014, 2729-2733.