7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level ngram probabilities and phrase-level context-dependent probabilities, which is beyond the standard context-free grammar (CFG) formalism. Such a shallow parsing approach can help balance sufficient grammar coverage and tight structure constraints. The context-dependent probabilistic shallow parsing model is represented by layered FSTs, which can be integrated with speech recognition seamlessly to impose early phrase-level structural constraints consistent with natural language understanding. It is shown that in the JUPITER  weather information domain, the shallow parsing model achieves lower recognition word error rates, compared to a regular class ngram model with the same order. However, we find that, with a higher order top-level n-gram model, pre-composition and optimization of the FSTs are highly restricted by the computational resources available. Given the potential of such models, it may be worth pursing an incremental approximation strategy , which includes part of the linguistic model FST in early optimization, while introducing the complete model through dynamic composition.
Bibliographic reference. Mou, Xiaolong / Seneff, Stephanie / Zue, Victor (2002): "Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers", In ICSLP-2002, 1289-1292.