Conventional acoustic models, such as Gaussian mixture models (GMM) or deep neural networks (DNN), cannot be reliably estimated when there are very little speech training data, e.g. less than 1 hour. In this paper, we investigate the use of a non-parametric kernel density estimation method to predict the emission probability of HMM states. In addition, we introduce a discriminative score calibrator to improve the speech class posteriors generated by the kernel density for speech recognition task. Experimental results on the Wall Street Journal task show that the proposed acoustic model using cross-lingual bottleneck features significantly outperforms GMM and DNN models for limited training data case.
Bibliographic reference. Do, Van Hai / Xiao, Xiong / Chng, Eng Siong / Li, Haizhou (2014): "Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR", In INTERSPEECH-2014, 6-10.