ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Lattice extension and rescoring based approaches for LVCSR of Turkish

Ebru Arsoy, Murat Saraclar

In this paper, we present some techniques to solve the problems of Turkish Large Vocabulary Continuous Speech Recognition (LVCSR). Its agglutinative nature makes Turkish a challenging language in terms of speech recognition since it is impossible to include all possible words in the recognition lexicon. Therefore, data-driven sub-word recognition units, in addition to words, are used in a newspaper content transcription task. We obtain Word Error Rates (WER) of 38.8% for the baseline word model and 33.9% for the baseline sub-word model. In addition, some new methods are investigated. Baseline lattice outputs of each model are rescored with the root and root-class language models for words and first-sub-word language model for sub-words. The word-root interpolation achieves 0.5% decrease in the WER. Other two approaches fail due to the non-robust estimates over the baseline models. Moreover, we have tried dynamic vocabulary extension techniques to handle the Out-of-Vocabulary (OOV) problem in the word model and to remove non-word items in the sub-word model. Applying this method to the 50K baseline word model, in the best situation, we obtain an error rate of 36.2%. In average, the lexicon size of this method is around 188K. However, the error rate is approximately same as the 120K lexicon recognizer. For sub-words, 1.1% absolute improvement is achieved with the vocabulary extension technique giving us our best result.


doi: 10.21437/Interspeech.2006-331

Cite as: Arsoy, E., Saraclar, M. (2006) Lattice extension and rescoring based approaches for LVCSR of Turkish. Proc. Interspeech 2006, paper 1622-Tue2A2O.2, doi: 10.21437/Interspeech.2006-331

@inproceedings{arsoy06_interspeech,
  author={Ebru Arsoy and Murat Saraclar},
  title={{Lattice extension and rescoring based approaches for LVCSR of Turkish}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1622-Tue2A2O.2},
  doi={10.21437/Interspeech.2006-331}
}