14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Context-Dependent Phone Mapping for LVCSR of Under-Resourced Languages

Van Hai Do (1), Xiong Xiao (2), Eng Siong Chng (1), Haizhou Li (1)

(1) Nanyang Technological University, Singapore
(2) TL@NTU, Singapore

This paper presents a context-dependent phone mapping approach for acoustic modeling of large vocabulary speech recognition for under-resourced languages by leveraging on well trained models of other languages. Generally speaking, phone mapping can be considered as a hybrid HMM/MLP (Hidden Markov Model / Multilayer Perceptron) model where the input of the MLP is phone acoustic scores, e.g. likelihood or posterior scores. In this paper, we use deep neural networks trained with a lot of Malay training data to generate bottleneck and posterior features for the target English acoustic models. We extend the concept of phone mapping by using not only posteriors but also bottleneck feature as the input for phone mapping. Experiments show that the phone mapping technique outperforms the cross-lingual tandem approach significantly. In addition, we also show that bottleneck and posterior features contain complementary information. A consistent improvement is obtained by combining these two feature streams to form the input for phone mapping.

Full Paper

Bibliographic reference.  Do, Van Hai / Xiao, Xiong / Chng, Eng Siong / Li, Haizhou (2013): "Context-dependent phone mapping for LVCSR of under-resourced languages", In INTERSPEECH-2013, 500-504.