Expressiveness Influences Human Vocal Alignment Toward voice-AI

Michelle Cohn, Georgia Zellou


This study explores whether people align to expressive speech spoken by a voice-activated artificially intelligent device (voice-AI), specifically Amazon’s Alexa. Participants shadowed words produced by the Alexa voice in two acoustically distinct conditions: “regular” and “expressive”, containing more exaggerated pitch contours and longer word durations. Another group of participants rated the shadowed items, in an AXB perceptual similarity task, as an assessment of overall degree of vocal alignment. Results show greater vocal alignment toward expressive speech produced by the Alexa voice and, furthermore, systematic variation based on speaker gender. Overall, these findings have applications to the field of affective computing in understanding human responses to synthesized emotional expressiveness.


 DOI: 10.21437/Interspeech.2019-1368

Cite as: Cohn, M., Zellou, G. (2019) Expressiveness Influences Human Vocal Alignment Toward voice-AI. Proc. Interspeech 2019, 41-45, DOI: 10.21437/Interspeech.2019-1368.


@inproceedings{Cohn2019,
  author={Michelle Cohn and Georgia Zellou},
  title={{Expressiveness Influences Human Vocal Alignment Toward voice-AI}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={41--45},
  doi={10.21437/Interspeech.2019-1368},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1368}
}