Posterior-based or bottleneck features derived from neural networks trained on out-of-domain data may be successfully applied to improve speech recognition performance when data is scarce for the target domain or language. In this paper we combine this approach with the use of a hierarchical deep neural network (DNN) network structure which we term a multi-level adaptive network (MLAN) and the use of multitask learning. We have applied the technique to cross-lingual speech recognition experiments on recordings of TED talks and European Parliament sessions in English (source language) and German (target language). We demonstrate that the proposed method can lead to improvements over standard methods, even when the quantity of training data for the target language is relatively high. When the complete method is applied, we achieve relative WER reductions of around 13% compared to a monolingual hybrid DNN baseline.
Bibliographic reference. Bell, Peter / Driesen, Joris / Renals, Steve (2014): "Cross-lingual adaptation with multi-task adaptive networks", In INTERSPEECH-2014, 21-25.