Hybrid Neural Network/Hidden Markov Model (NN/HMM) systems have been found to yield high quality phone recognition performance. One issue with modelling the Context Dependent (CD) NN/HMM is the robust estimation of the NN parameters to reliably predict the large number of CD state posteriors. Previously, factorization based on conditional probabilities has been commonly adopted to circumvent this problem. This paper proposes two factorization schemes based on the product-of-expert framework, depending on the choice of the experts. In addition, smoothing and interpolation schemes were introduced to improve robustness. Experimental results on the WSJCAM0 reveal that the proposed CD NN/HMM parameter estimation techniques achieved consistent improvement compared to CI hybrid systems. The best hybrid system achieves a 21.7% relative phone error rate reduction and a 17.6% word error reduction compared to a discriminative trained context dependent triphone GMM/HMM system.
Bibliographic reference. Wang, Guangsen / Sim, Khe Chai (2011): "Comparison of smoothing techniques for robust context dependent acoustic modelling in hybrid NN/HMM systems", In INTERSPEECH-2011, 457-460.