We apply multilayer perceptron (MLP) based hierarchical Tandem features to large vocabulary continuous speech recognition in Mandarin. Hierarchical Tandem features are estimated using a cascade of two MLP classifiers which are trained independently. The first classifier is trained on perceptual linear predictive coefficients with a 90 ms temporal context. The second classifier is trained using the phonetic class conditional probabilities estimated by the first MLP, but with a relatively longer temporal context of about 150 ms. Experiments on the Mandarin DARPA GALE eval06 dataset show significant reduction (7.6% relative) in character error rates by using hierarchical Tandem features over conventional Tandem features.
Bibliographic reference. Pinto, Joel / Magimai-Doss, Mathew / Bourlard, Hervé (2011): "Hierarchical tandem features for ASR in Mandarin", In INTERSPEECH-2011, 1241-1244.