This paper presents an early study on building Vietnamese large vocabulary continuous speech recognition with concentration on choosing type of units and feature set. Our experiments were done using the HTK Toolkit and VOV broadcast corpus. The results show that the recognizer with mixture units achieved better performance than recognizers with initial-final units and phoneme units. Among feature sets are applied, MFCC has performance somewhat better than PLP, and the combination of MFCC and F0 features increases the accuracy of the Vietnamese recognition system.
Cite as: Vu, T.T., Nguyen, D.T., Luong, M.C., Hosom, J.-P. (2005) Vietnamese large vocabulary continuous speech recognition. Proc. Interspeech 2005, 1689-1692, doi: 10.21437/Interspeech.2005-550
@inproceedings{vu05_interspeech, author={Thang Tat Vu and Dung Tien Nguyen and Mai Chi Luong and John-Paul Hosom}, title={{Vietnamese large vocabulary continuous speech recognition}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1689--1692}, doi={10.21437/Interspeech.2005-550} }