ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Transcribing broadcast data using MLP features

Petr Fousek, Lori Lamel, Jean-Luc Gauvain

This paper describes incorporating discriminative features from a multi layer perceptron (MLP) into a state-of-the-art Arabic broadcast data transcription system based on cepstral features. The MLP features are based on a recently proposed Bottle-Neck architecture with long-term warped LP-TRAP speech representation at the input. It is shown that the previously reported improvements on a development Arabic transcription system carry through to a full system at a state-of-the-art level. SAT, CMLLR and MLLR adaptation techniques are shown to be useful for both MLP and combined features, though to a lesser degree than for PLPs. Without adaptation, MLP features obtain superior performance to cepstral features in all test conditions, and with adaptation both feature sets give comparable results. Combining the features, either by feature concatenation or system hypotheses, gives significant gains. Gains from MMI model training seem to be additive to the gain coming from discriminative MLP features.

doi: 10.21437/Interspeech.2008-414

Cite as: Fousek, P., Lamel, L., Gauvain, J.-L. (2008) Transcribing broadcast data using MLP features. Proc. Interspeech 2008, 1433-1436, doi: 10.21437/Interspeech.2008-414

  author={Petr Fousek and Lori Lamel and Jean-Luc Gauvain},
  title={{Transcribing broadcast data using MLP features}},
  booktitle={Proc. Interspeech 2008},