ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Transducer-based speech recognition with dynamic language models

Munir Georges, Stephan Kanthak, Dietrich Klakow

In this paper, a method is proposed which embeds regular grammars into an N-gram Markov language model. This allows accurate speech recognition even for N-gram models estimated on sparse grammatical word sequences. Moreover, it allows explicit userdependent modelling of word sequences, such as phone numbers, email addresses or US ZIP codes, separately from the Markov model. The method is theoretically described along with a feasible implementation overview. More precisely, a language model preprocessing step generalizes the enclosed grammatical word sequences during language model learning. These grammars are embedded during speech decoding by using a novel transducer nesting technique. The Wall Street Journal corpus was used to evaluate the proposed method. We achieved a word error rate reduction of 31.1%. A computational environment was used, which is typical for car head units or mobile devices.


doi: 10.21437/Interspeech.2013-185

Cite as: Georges, M., Kanthak, S., Klakow, D. (2013) Transducer-based speech recognition with dynamic language models. Proc. Interspeech 2013, 642-646, doi: 10.21437/Interspeech.2013-185

@inproceedings{georges13_interspeech,
  author={Munir Georges and Stephan Kanthak and Dietrich Klakow},
  title={{Transducer-based speech recognition with dynamic language models}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={642--646},
  doi={10.21437/Interspeech.2013-185}
}