15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Kernel Density-Based Acoustic Model with Cross-Lingual Bottleneck Features for Resource Limited LVCSR

Van Hai Do (1), Xiong Xiao (2), Eng Siong Chng (1), Haizhou Li (1)

(1) Nanyang Technological University, Singapore
(2) TL@NTU, Singapore

Conventional acoustic models, such as Gaussian mixture models (GMM) or deep neural networks (DNN), cannot be reliably estimated when there are very little speech training data, e.g. less than 1 hour. In this paper, we investigate the use of a non-parametric kernel density estimation method to predict the emission probability of HMM states. In addition, we introduce a discriminative score calibrator to improve the speech class posteriors generated by the kernel density for speech recognition task. Experimental results on the Wall Street Journal task show that the proposed acoustic model using cross-lingual bottleneck features significantly outperforms GMM and DNN models for limited training data case.

Full Paper

Bibliographic reference.  Do, Van Hai / Xiao, Xiong / Chng, Eng Siong / Li, Haizhou (2014): "Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR", In INTERSPEECH-2014, 6-10.