Duplex Conversation in Outbound Agent System

Chunxiang Jin, Minghui Yang, Zujie Wen

Intelligent outbound is a popular way to contact customers. The traditional outbound agents communicate with users in a simplex way. The user and the agent cannot speak at the same time, and the user cannot actively interrupt the conversation while the agent is playing audio generated by TTS. The traditional solution is based on the output of the VAD module, once the user voice is detected, the agent will immediately stop talking. However, the user sometimes expresses the short answer at will, not to interrupt the agent, and it will cause the agent to be frequently interrupted. In addition, when users say named entity nouns(numbers, locations, company names, etc), their speech speed is slow and the pause time between words is longer, and they may be interrupted by the agent unreasonably. We propose a method to identify user’s interruption requests and discontinuous expressions by analyzing the semantic information of the user’s utterance. As a result, fluency of the dialogue is improved.

Cite as: Jin, C., Yang, M., Wen, Z. (2021) Duplex Conversation in Outbound Agent System. Proc. Interspeech 2021, 4866-4867

