11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Comparing Measures of Synchrony and Alignment in Dialogue Speech Timing with Respect to Turn-Taking Activity

Nick Campbell (1), Stefan Scherer (2)

(1) Trinity College Dublin, Ireland
(2) Universität Ulm, Germany

This paper describes a system for predicting discourse-role features based on voice-activity detection. It takes as input a vector of values extracted from conversational speech and predicts turn-taking activity and active-listening patterns using an echo-state network. We observed evidence of frame-attunement using a measure of speech density which takes the ratio of speech to non-speech behaviour per utterance. We noted a synchrony of utterance timing and modelled this using the ESN. The system was trained on a subset of data from 100 telephone conversations from the 1,500-hour JST Expressive Speech Processing corpus, and predicts the interlocutor's timing behaviour with an error-rate of less than 15% based on one partner's speech-activity alone. An integrated system with access to content information would of course perform at higher rates.

Bibliographic reference.  Campbell, Nick / Scherer, Stefan (2010): "Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity", In INTERSPEECH-2010, 2546-2549.