15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Investigating the Learning Effect of Multilingual Bottle-Neck Features for ASR

Ngoc Thang Vu, Jochen Weiner, Tanja Schultz

KIT, Germany

Deep neural networks (DNNs) have become state-of-the-art techniques of automatic speech recognition in the last few years. They can be used at the preprocessing level (Tandem or Bottle-Neck features) or at the acoustic model level (hybrid Hidden Markov Model/DNN). Moreover, they allow exploiting multilingual data to improve monolingual systems. This paper presents our investigation of the learning effect of neural networks in the context of multilingual Bottle-Neck features. For this, we perform a visual analysis of the output of the Bottle-Neck layer of a neural network using t-Distributed Stochastic Neighbor Embedding. Our results show that multilingual Bottle-Neck features seem to learn phoneme characteristics, such as the F1 and F2 formants which characterize different vowels, and other articulatory features, such as fricatives and nasals which characterize consonants. Furthermore, they seem to normalize language dependent variations and transfer the learned representation to unseen languages.

Full Paper

Bibliographic reference.  Vu, Ngoc Thang / Weiner, Jochen / Schultz, Tanja (2014): "Investigating the learning effect of multilingual bottle-neck features for ASR", In INTERSPEECH-2014, 825-829.