This paper describes a novel architecture and algorithms for combining stochastic modeling and Natural Language Understanding techniques to help speech recognition and understanding. In this system, an utterance is initially processed by a speech recognizer using a standard class bigram language model to produce a single best scoring word string. This word string is then parsed by the Phoenix parser , which produces a semantic frame. The parser uses Recursive Transition Networks to represent semantic fragments, or word strings which are meaningful to the system. Semantic fragments of the utterance are assigned to slots in frames. Semantic, pragmatic and discourse knowledge is then applied to the parsed frame to identify misrecognized substrings and develop content predictions for the misrecognized regions. For this, we compute within utterance semantic constraints, constraints arising from speech repair acts (e.g. on-line edits and corrections) as well as dialog-based constraints arising from different types of sub-dialogs (or wnat have traditionally been called discourse and domain plans) and the content of prior inputs and system responses. The predictions correspond to a small subset of the semantic networks Known to the system. The region boundaries of the input along with the set of predicted semantic networks are passed to a Recursive Transition Network speech decoder which uses them in re-recognizing the specified region of the utterance. The networks used by the RTN decoder are the same ones used by the parser. Only the predicted subset of nets are used in the re-recognition. We describe our algorithms for detecting misrecognitions and generating predictions as well as the operation of our RTN-based recognizer. The system was prained on training data from the ARPA Air Travel Information Service (ATIS) task, and tested on an independent test set of 1000 utterances.
Bibliographic reference. Young, Sheryl R. / Ward, Wayne (1993): "Semantic and pragmatically based re-recognition of spontaneous speech", In EUROSPEECH'93, 2243-2246.