Analysis of Effect and Timing of Fillers in Natural Turn-Taking

Divesh Lala, Shizuka Nakamura, Tatsuya Kawahara

Turn-taking for spoken dialogue systems is still below the speed of real human conversation due to latency in speech and natural language processing, but fillers can be used by the system to take the turn more quickly without sacrificing naturalness. In this work we analyze fillers which are used at the start of turns in conversation and determine a window of appropriate times to use them. We analyze a human-robot conversation corpus to obtain an average response time of the fillers, and find that this differs according to the filler’s form. We then conduct a subjective experiment in which participants dynamically change the timing of responses with and without fillers to designate a window of acceptable response timings. Our results show that the most suitable response time is around 200–500ms after the previous speaker has finished their turn. We also find differences in timing windows depending on existence of a filler used to begin the turn and its particular form. The implications of these results on the design of conversational systems are also discussed.

 DOI: 10.21437/Interspeech.2019-1527

Cite as: Lala, D., Nakamura, S., Kawahara, T. (2019) Analysis of Effect and Timing of Fillers in Natural Turn-Taking. Proc. Interspeech 2019, 4175-4179, DOI: 10.21437/Interspeech.2019-1527.

