2nd Workshop on Spoken Language Technologies for Under-Resourced Languages
Universiti Sains, Penang, Malaysia
In this paper, our recent progress in developing and evaluating Malay Large Vocabulary Continuous Speech Recognizer (LVCSR) with considerations of linguistic information is discussed. The best baseline system has a WER of 15.8%. In order to propose methods to improve the accuracies further, additional experiments have been performed using linguistic information such as part-ofspeech and stem. We have also tested our system by creating a language model using a small amount of texts and suggested that linguistic knowledge can be used to improve the accuracy of Malay automatic speech recognition system.
Index Terms: Speech Recognition, Agglutinative Language, Language Modeling, Part-Of-Speech, Stem
Bibliographic reference. Sze, Hong Kai / Ping, Tan Tien / Kong, Tang Enya / Yu-N, Cheah (2010): "Malay language modeling in large vocabulary continuous speech recognition with linguistic information", In SLTU-2010, 56-61.