Recently, a multilingual Multi Layer Perceptron (MLP) training method was introduced without having to explicitly map the phonetic units of multiple languages to a common set. This paper further investigates this method using bottleneck (BN) tandem connectionist acoustic modeling for four high-resourced languages .English, French, German, and Polish. Aiming at the improvement of already existing high performing automatic speech recognition (ASR) systems, the multilingual training of the BN-MLP is extended from short-term to hierarchical long-term (multi-resolutional RASTA) feature extraction. Furthermore, deeper structures and context-dependent target labels are also examined. We experimentally demonstrate that a single state-of-the-art BN feature set can be trained for multiple languages, which is superior to the monolingual feature set, and results in significant gains in all the four languages. Studying the scalability of the multilingual BN features, a similar gain is observed in small (50 hours) and in larger scale (300 hours) ASR experiments regardless of the distribution of the data amount between the languages. Using deeper structures, context-dependent targets, and speaker adaptation, the multilingual BN reduces the word error rates by 3.7% relative over the target language BN features and 25.30% over the conventional MFCC system.
Bibliographic reference. Tüske, Zoltán / Schlüter, Ralf / Ney, Hermann (2013): "Multilingual hierarchical MRASTA features for ASR", In INTERSPEECH-2013, 2222-2226.