ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Discriminative n-gram language modeling for Turkish

Ebru Arısoy, Brian Roark, Izhak Shafran, Murat Saraçlar

In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the language models. Various feature sets provide reductions in the word error rate (WER). Our best result is obtained with the inflectional group n-gram features. 1.0% absolute improvement is achieved over the baseline model and this improvement is statistically significant at p<0.001 as measured by the NIST MAPSSWE significance test.

doi: 10.21437/Interspeech.2008-251

Cite as: Arısoy, E., Roark, B., Shafran, I., Saraçlar, M. (2008) Discriminative n-gram language modeling for Turkish. Proc. Interspeech 2008, 825-828, doi: 10.21437/Interspeech.2008-251

  author={Ebru Arısoy and Brian Roark and Izhak Shafran and Murat Saraçlar},
  title={{Discriminative n-gram language modeling for Turkish}},
  booktitle={Proc. Interspeech 2008},