Interspeech'2005 - Eurospeech
We present a multi-pass approach to real-time, large-vocabulary speech recognition in which we dynamically manipulate the vocabulary between passes. For recognition tasks where subsets of the vocabulary can be triggered by the occurrences of other words or phrases, a combination of unknown word modelling and vocabulary refinement can be utilized to attack large-vocabulary tasks with relatively small active vocabularies. We evaluate this approach within the JUPITER weather information domain by enabling recognition of all 30,000 city-state pairs within the USA. By maximally precompiling the static and dynamic portions of our search space using finite-state transducers (FSTs), we splice dynamic-vocabulary components on-demand during decoding with negligible speed impact while enforcing cross-word context-dependent constraints. We find that a dynamic-vocabulary system can compete quite favorably with a single-pass, large-vocabulary system. For even larger vocabularies (e.g., street addresses), static compilation may be infeasible, making a dynamic-vocabulary approach necessary.
Bibliographic reference. Hetherington, I. Lee (2005): "A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition", In INTERSPEECH-2005, 545-548.