10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Sequential Minimization Algorithm for Finite-State Pronunciation Lexicon Models

Simon Dobrišek, Boštjan Vesnicer, France Mihelič

University of Ljubljana, Slovenia

The paper first presents a large-vocabulary automatic speechrecognition system that is being developed for the Slovenian language. The concept of a single-pass token-passing algorithm for the fast speech decoding that can be used with the designed multi-level system structure is discussed. From the algorithmic point of view, the main component of the system is a finite-state pronunciation lexicon model. This component has crucial impact on the overall performance of the system and we developed a sequential minimization algorithm that very efficiently reduces the size and algorithmic complexity of the lexicon model. Our finitestate lexicon model is represented as a state-emitting finite-state transducer. The presented experiments show that the sequential minimization algorithm easily outperforms (up to 60%) the conventional algorithms that were developed for the static global optimization of the transition-emitting finite-state transducers. These algorithms are delivered as part of the AT&T FSM library and the OpenFST library.

Full Paper

Bibliographic reference.  Dobrišek, Simon / Vesnicer, Boštjan / Mihelič, France (2009): "A sequential minimization algorithm for finite-state pronunciation lexicon models", In INTERSPEECH-2009, 720-723.