Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

High Accuracy Acoustic Modeling Based on Multi-stage Decision Tree

DongHwa Kim (1), Chaojun Liu (2), Xintian Wu (2), Yonghong Yan (2)

(1) Department of Communication and Information, Miryang National University, Miryang, KyungNam, Korea
(2) Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology, Portland, OR, USA

In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method has used just single Gaussian triphone models to cluster states. The coarse clusters generated using just single Gaussian models can lead to low accuracy acoustic modeling and result in low recognition performance of the system. In this paper, a multi-stage decision tree using both multi-mixture Gaussian models and single Gaussian models is proposed. Continuous speech recognition experiment using this approach on WSJ data showed a reduction in word error rate comparing to the standard decision tree based system.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Kim, DongHwa / Liu, Chaojun / Wu, Xintian / Yan, Yonghong (1999): "High accuracy acoustic modeling based on multi-stage decision tree", In EUROSPEECH'99, 1335-1338.