9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Discriminative N-Gram Language Modeling for Turkish

Ebru Arısoy (1), Brian Roark (2), Izhak Shafran (2), Murat Saraçlar (1)

(1) Boğaziçi University, Turkey; (2) Oregon Health & Science University, USA

In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the language models. Various feature sets provide reductions in the word error rate (WER). Our best result is obtained with the inflectional group n-gram features. 1.0% absolute improvement is achieved over the baseline model and this improvement is statistically significant at p<0.001 as measured by the NIST MAPSSWE significance test.

Full Paper

Bibliographic reference.  Arısoy, Ebru / Roark, Brian / Shafran, Izhak / Saraçlar, Murat (2008): "Discriminative n-gram language modeling for Turkish", In INTERSPEECH-2008, 825-828.