In this paper, we apply context dependent phonetic modeling on the task of large vocabulary (with 20 thousand words) Taiwanese multi-syllabic word recognition. Considering the phonetic characteristics of Taiwanese, the right context dependent (RCD) phones instead of the general tri-phones are used. The RCDs are further clustered at the sub-phone or state level using a decision tree with a set of context-split questions specially designed for Taiwanese speech according to the acoustic/phonetic knowledge. For the speaker dependent case, 7.18% word error rate is achieved. A real-time prototype system implemented on a Pentium-II personal computer running MS-Windows95/ NT is also shown to validate the approaches proposed here.
Cite as: Lyu, R.-y., Chiang, Y.-j., Hsieh, W.-p. (1998) A large-vocabulary taiwanese (MIN-NAN) multi-syllabic word recognition system based upon right-context-dependent phones with state clustering by acoustic decision tree. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0080, doi: 10.21437/ICSLP.1998-296
@inproceedings{lyu98_icslp, author={Ren-yuan Lyu and Yuang-jin Chiang and Wen-ping Hsieh}, title={{A large-vocabulary taiwanese (MIN-NAN) multi-syllabic word recognition system based upon right-context-dependent phones with state clustering by acoustic decision tree}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0080}, doi={10.21437/ICSLP.1998-296} }