Real-Time Reactive Speech Synthesis: Incorporating Interruptions

Mirjam Wester, David A. Braude, Blaise Potard, Matthew P. Aylett, Francesca Shaw


The ability to be interrupted and react in a realistic manner is a key requirement for interactive speech interfaces. While previous systems have long implemented techniques such as ‘barge in’ where speech output can be halted at word or phrase boundaries, less work has explored how to mimic human speech output responses to real-time events like interruptions which require a reaction from the system. Unlike previous work which has focused on incremental production, here we explore a novel re-planning approach. The proposed system is versatile and offers a large range of possible ways to react. A focus group was used to evaluate the approach, where participants interacted with a system reading out a text. The system would react to audio interruptions, either with no reactions, passive reactions, or active negative reactions (i.e. getting increasingly irritated). Participants preferred a reactive system.


 DOI: 10.21437/Interspeech.2017-1250

Cite as: Wester, M., Braude, D.A., Potard, B., Aylett, M.P., Shaw, F. (2017) Real-Time Reactive Speech Synthesis: Incorporating Interruptions. Proc. Interspeech 2017, 3996-4000, DOI: 10.21437/Interspeech.2017-1250.


@inproceedings{Wester2017,
  author={Mirjam Wester and David A. Braude and Blaise Potard and Matthew P. Aylett and Francesca Shaw},
  title={Real-Time Reactive Speech Synthesis: Incorporating Interruptions},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3996--4000},
  doi={10.21437/Interspeech.2017-1250},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1250}
}