This paper proposes a new approach for the integration of the Vietnamese language characteristics into a Large Vocabulary Continuous Speech Recognition System (LVCSR) which was built for some European languages. Firstly, a new module of tone recognition using Hidden Markov model was constructed. Secondly, several methods were applied to transform a text corpus of monosyllabic words into text corpus of polysyllabic words and a statistical language model of polysyllabic words was built by using the new text corpus. Finally, all the knowledge has been included in the LVCSR system so that this system can be adapted for Vietnamese. Experiments are made on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system was increased, 46% of relative reduction of the word error rate is obtained by using Vietnamese language characteristics.
Bibliographic reference. Nguyen, Hong Quang / Nocera, Pascal / Castelli, Eric / Trinh, Van Loan (2008): "A novel approach in continuous speech recognition for Vietnamese, an isolating tonal language", In INTERSPEECH-2008, 1149-1152.