SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages

Hanoi, Vietnam
May 5-7, 2008

Large Vocabulary Continuous Speech Recognition for Vietnamese, an Under-Resourced Language

Hong Quang Nguyen (1,2), Pascal Nocera (1), Eric Castelli (2), Van Loan Trinh (2)

(1) Laboratoire Informatique d’Avignon LIA, University of Avignon, France
(2) International Research Center MICA, HUT (UMI2954/CNRS-INP Grenoble), Hanoi, Vietnam

This paper proposes a method to build a Vietnamese Large Vocabulary Continuous Speech Recognition system (Vietnamese LVCSR system). The difference between Vietnamese and European languages is analyzed and used to adapt a LVCSR system for European languages to Vietnamese. Experiments are implemented on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system is increased by using Vietnamese language characteristics.

Index Terms— Automatic speech recognition, Vietnamese language, under-resourced language, tone recognition, compound noun.

Full Paper
Presentation (pdf)

Bibliographic reference.  Nguyen, Hong Quang / Nocera, Pascal / Castelli, Eric / Trinh, Van Loan (2008): "Large vocabulary continuous speech recognition for Vietnamese, an under-resourced language", In SLTU-2008, 23-26.