Recently deep neural networks (DNNs) have become increasingly popular for acoustic modelling in automatic speech recognition (ASR) systems. As the bottleneck features they produce are inherently discriminative and contain rich hidden factors that influence the surface acoustic realization, the standard approach is to augment the conventional acoustic features with the bottleneck features in a tandem framework. In this paper, an alternative approach to incorporate bottleneck features is investigated. The complex relationship between acoustic features and DNN bottleneck features is modelled using generalized variable parameter HMMs (GVP-HMMs). The optimal GVP-HMM structural configuration and model parameters are automatically learnt. Significant error rate reductions of 48% and 8% relative were obtained over the baseline multi-style HMM and tandem HMM systems respectively on Aurora 2.
Bibliographic reference. Xie, Xurong / Su, Rongfeng / Liu, Xunying / Wang, Lan (2014): "Deep neural network bottleneck features for generalized variable parameter HMMs", In INTERSPEECH-2014, 2739-2743.