In human speech, speaking style and prosody are often a matter of context, and for Alexa's interactions with customers to be as natural as possible, the same should be true for her. This talk presents recent work at Amazon to generate diverse and appropriated spoken responses. One of the models generates alternative phrasings in a context-aware way, so that Alexa does not keep asking the same question repeatedly. Then, the speech generator adapts the speaking style depending on the dialogue state.
Bonafonte, A (2021) Diverse Conversational Spoken Language Generation. Proc. IberSPEECH 2021.