In this paper, we explore multilingual feature-level data sharing via Deep Neural Network (DNN) stacked bottleneck features. Given a set of available source languages, we apply language identification to pick the language most similar to the target language, for more efficient use of multilingual resources. Our experiments with IARPA-Babel languages show that bottleneck features trained on the most similar source language perform better than those trained on all available source languages. Further analysis suggests that only data similar to the target language is useful for multilingual training.
Bibliographic reference. Cutler, Anne / Zhang, Yu / Chuangsuwanich, Ekapol / Glass, James R. (2014): "Language ID-based training of multilingual stacked bottleneck features", In INTERSPEECH-2014, 1-5.