10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Efficient Generation and Use of MLP Features for Arabic Speech Recognition

J. Park, F. Diehl, M. J. F. Gales, M. Tomalin, P. C. Woodland

University of Cambridge, UK

Front-end features computed using Multi-Layer Perceptrons (MLPs) have recently attracted much interest, but are a challenge to scale to large networks and very large training data sets. This paper discusses methods to reduce the training time for the generation of MLP features and their use in an ASR system using a variety of techniques: parallel training of a set of MLPs on different data sub-sets; methods for computing features from by a combination of these networks; and rapid discriminative training of HMMs using MLP-based features. The impact on MLP frame-based accuracy using different training strategies is discussed along with the effect on word rates from incorporating the MLP features in various configurations into an Arabic broadcast audio transcription system.

Full Paper

Bibliographic reference.  Park, J. / Diehl, F. / Gales, M. J. F. / Tomalin, M. / Woodland, P. C. (2009): "Efficient generation and use of MLP features for Arabic speech recognition", In INTERSPEECH-2009, 236-239.