INTERSPEECH 2004 - ICSLP
Miscommunication in human-computer interaction is unavoidable, although speech recognition accuracy continues to improve. While prior research has emphasized identifying the corrective status of an utterance, we focus in this paper on identifying the point of local correction. Users of spoken language systems often do not use specific syntactic structures or cue phrases to identify corrective intent or corrected content; most commonly a valid utterance is simply repeated, possibly slightly reworded. However, users do exploit prosodic cues to signal both presence and location of a correction. Using utterances from the 2000 and 2001 Communicator evaluation data collections, we build a boosted classifier to automatically identify the point of local correction in a corrective utterance. Exploiting the within sentence rank of prosodic cues including pitch maximum, pitch range, and intensity maximum, we distinguish locally corrected elements from other elements at 85.5% accuracy, a nearly 50% reduction in error rate over a naive majority class assignment.
Bibliographic reference. Levow, Gina-Anne (2004): "Identifying local corrections in human-computer dialogue", In INTERSPEECH-2004, 313-316.