9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Transcribing Broadcast Data Using MLP Features

Petr Fousek, Lori Lamel, Jean-Luc Gauvain

LIMSI, France

This paper describes incorporating discriminative features from a multi layer perceptron (MLP) into a state-of-the-art Arabic broadcast data transcription system based on cepstral features. The MLP features are based on a recently proposed Bottle-Neck architecture with long-term warped LP-TRAP speech representation at the input. It is shown that the previously reported improvements on a development Arabic transcription system carry through to a full system at a state-of-the-art level. SAT, CMLLR and MLLR adaptation techniques are shown to be useful for both MLP and combined features, though to a lesser degree than for PLPs. Without adaptation, MLP features obtain superior performance to cepstral features in all test conditions, and with adaptation both feature sets give comparable results. Combining the features, either by feature concatenation or system hypotheses, gives significant gains. Gains from MMI model training seem to be additive to the gain coming from discriminative MLP features.

Full Paper

Bibliographic reference.  Fousek, Petr / Lamel, Lori / Gauvain, Jean-Luc (2008): "Transcribing broadcast data using MLP features", In INTERSPEECH-2008, 1433-1436.