We have developed a recognition system that can understand the multi-party conversation from combined information of prosody and gaze. In multi-party conversation, the conversation becomes complex because many overlaps and interrupts are generated by side participants. And thus becomes difficult to keep track the main thread of the conversation. Gaze works as a strong clue to both clarify and perceive "whose talking to whom" and "whose listening to whom", and can be used to improve the understanding of the conversational situation. We have analyzed the gaze behavior in conversational situations based on actual human-to-human conversation recoding, and created a computational model to recognize the main thread of the conversation. The performance has improved up to 20 point compared to the condition that only used prosody.
Cite as: Matsusaka, Y. (2005) Recognition of (3) party conversation using prosody and gaze. Proc. Interspeech 2005, 1205-1208, doi: 10.21437/Interspeech.2005-369
@inproceedings{matsusaka05_interspeech, author={Yosuke Matsusaka}, title={{Recognition of (3) party conversation using prosody and gaze}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1205--1208}, doi={10.21437/Interspeech.2005-369} }