15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Cross-Lingual Adaptation with Multi-Task Adaptive Networks

Peter Bell, Joris Driesen, Steve Renals

University of Edinburgh, UK

Posterior-based or bottleneck features derived from neural networks trained on out-of-domain data may be successfully applied to improve speech recognition performance when data is scarce for the target domain or language. In this paper we combine this approach with the use of a hierarchical deep neural network (DNN) network structure — which we term a multi-level adaptive network (MLAN) — and the use of multitask learning. We have applied the technique to cross-lingual speech recognition experiments on recordings of TED talks and European Parliament sessions in English (source language) and German (target language). We demonstrate that the proposed method can lead to improvements over standard methods, even when the quantity of training data for the target language is relatively high. When the complete method is applied, we achieve relative WER reductions of around 13% compared to a monolingual hybrid DNN baseline.

Full Paper

Bibliographic reference.  Bell, Peter / Driesen, Joris / Renals, Steve (2014): "Cross-lingual adaptation with multi-task adaptive networks", In INTERSPEECH-2014, 21-25.