September 22-25, 1997
Context-dependent models for language units are essential in high-accuracy speech recognition. However, standard speech recognition frameworks are based on the substitution of level models for higher-level units. Since substitution cannot express context-dependency constraints, actual recognizers use restrictive model-structure assumptions and specialized code for context-dependent models, leading to decreased flexibility and lost opportunities for automatic model optimization. Instead, we propose a recognition framework that builds in the possibility of context dependency from the start by using weighted finite-state transduction rather than substitution. The framework is mented with a general demand-driven transducer composition algorithm that allows great flexibility in model structure, form of context dependency and network expansion method, while achieving competitive recognition performance.
Bibliographic reference. Riley, Michael / Pereira, Fernando / Mohri, Mehryar (1997): "Transducer composition for context-dependent network expansion", In EUROSPEECH-1997, 1427-1430.