5th International Conference on Spoken Language Processing
In VERBMOBIL we previously augmented the output of a word recognizer with prosodic information. Here we present a new approach of interleaving word recognition and prosodic processing. While we still use the output of a word recognizer to determine phrase boundaries, we do not wait until the end of the utterance before we start processing. Instead we intercept chunks of word hypotheses during the forward search of the recognizer. Neural networks and language models are used to predict phrase boundaries. Those boundary hypotheses, in turn, are used by the recognizer to cut the stream of incoming speech into syntactic-prosodic phrases. Thus, incremental processing is possible. We investigate which features are suited for incremental prosodic processing and compare them w.r.t. classification performance and efficiency. We show that with a set of features that can be computed efficiently classification results are achieved which are almost as good as those with the previously used computationally more expensive features.
Bibliographic reference. Buckow, Jan / Batliner, Anton / Huber, Richard / Nöth, Elmar / Warnke, Volker / Niemann, Heinrich (1998): "Dovetailing of acoustics and prosody in spontaneous speech recognition", In ICSLP-1998, paper 0336.