14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Transducer-Based Speech Recognition with Dynamic Language Models

Munir Georges (1), Stephan Kanthak (1), Dietrich Klakow (2)

(1) Nuance Communications, Germany
(2) Universität des Saarlandes, Germany

In this paper, a method is proposed which embeds regular grammars into an N-gram Markov language model. This allows accurate speech recognition even for N-gram models estimated on sparse grammatical word sequences. Moreover, it allows explicit userdependent modelling of word sequences, such as phone numbers, email addresses or US ZIP codes, separately from the Markov model. The method is theoretically described along with a feasible implementation overview. More precisely, a language model preprocessing step generalizes the enclosed grammatical word sequences during language model learning. These grammars are embedded during speech decoding by using a novel transducer nesting technique. The Wall Street Journal corpus was used to evaluate the proposed method. We achieved a word error rate reduction of 31.1%. A computational environment was used, which is typical for car head units or mobile devices.

Full Paper

Bibliographic reference.  Georges, Munir / Kanthak, Stephan / Klakow, Dietrich (2013): "Transducer-based speech recognition with dynamic language models", In INTERSPEECH-2013, 642-646.