In this paper, we present some techniques to solve the problems of Turkish Large Vocabulary Continuous Speech Recognition (LVCSR). Its agglutinative nature makes Turkish a challenging language in terms of speech recognition since it is impossible to include all possible words in the recognition lexicon. Therefore, data-driven sub-word recognition units, in addition to words, are used in a newspaper content transcription task. We obtain Word Error Rates (WER) of 38.8% for the baseline word model and 33.9% for the baseline sub-word model. In addition, some new methods are investigated. Baseline lattice outputs of each model are rescored with the root and root-class language models for words and first-sub-word language model for sub-words. The word-root interpolation achieves 0.5% decrease in the WER. Other two approaches fail due to the non-robust estimates over the baseline models. Moreover, we have tried dynamic vocabulary extension techniques to handle the Out-of-Vocabulary (OOV) problem in the word model and to remove non-word items in the sub-word model. Applying this method to the 50K baseline word model, in the best situation, we obtain an error rate of 36.2%. In average, the lexicon size of this method is around 188K. However, the error rate is approximately same as the 120K lexicon recognizer. For sub-words, 1.1% absolute improvement is achieved with the vocabulary extension technique giving us our best result.
Cite as: Arsoy, E., Saraclar, M. (2006) Lattice extension and rescoring based approaches for LVCSR of Turkish. Proc. Interspeech 2006, paper 1622-Tue2A2O.2, doi: 10.21437/Interspeech.2006-331
@inproceedings{arsoy06_interspeech, author={Ebru Arsoy and Murat Saraclar}, title={{Lattice extension and rescoring based approaches for LVCSR of Turkish}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1622-Tue2A2O.2}, doi={10.21437/Interspeech.2006-331} }