If a dialog system were to respond to a user as naturally as a human, interaction would be smoother. Imitating the human prosodic behavior of utterances is important in computer-human natural conversations. In this paper, to develop a cooperative/ friendly spoken dialog system, we analyzed the correlations between F0 synchrony tendency or overlap frequency and subjective measures: "liveliness," "familiarity," and "informality" in human-human dialogs. We also modeled the properties of these features and implemented the model on our dialog system that generated the response timing of aizuchi (back-channel), turn-taking based on a decision tree in real time, and dynamical F0 changes to realize chat-like conversations.
Cite as: Nishimura, R., Kitaoka, N., Nakagawa, S. (2007) Prosody change and response timing analysis in spontaneously spoken dialogs and their modeling in a spoken dialog system. Proc. Interspeech 2007, 2565-2568, doi: 10.21437/Interspeech.2007-681
@inproceedings{nishimura07_interspeech, author={Ryota Nishimura and Norihide Kitaoka and Seiichi Nakagawa}, title={{Prosody change and response timing analysis in spontaneously spoken dialogs and their modeling in a spoken dialog system}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={2565--2568}, doi={10.21437/Interspeech.2007-681} }