This paper describes incorporating discriminative features from a multi layer perceptron (MLP) into a state-of-the-art Arabic broadcast data transcription system based on cepstral features. The MLP features are based on a recently proposed Bottle-Neck architecture with long-term warped LP-TRAP speech representation at the input. It is shown that the previously reported improvements on a development Arabic transcription system carry through to a full system at a state-of-the-art level. SAT, CMLLR and MLLR adaptation techniques are shown to be useful for both MLP and combined features, though to a lesser degree than for PLPs. Without adaptation, MLP features obtain superior performance to cepstral features in all test conditions, and with adaptation both feature sets give comparable results. Combining the features, either by feature concatenation or system hypotheses, gives significant gains. Gains from MMI model training seem to be additive to the gain coming from discriminative MLP features.
Bibliographic reference. Fousek, Petr / Lamel, Lori / Gauvain, Jean-Luc (2008): "Transcribing broadcast data using MLP features", In INTERSPEECH-2008, 1433-1436.