Three’s a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant

Eran Raveh, Ingo Siegert, Ingmar Steiner, Iona Gessinger, Bernd Möbius


This study examines how the presence of other speakers affects the interaction with a spoken dialogue system. We analyze participants’ speech regarding several phonetic features, viz., fundamental frequency, intensity, and articulation rate, in two conditions: with and without additional speech input from a human confederate as a third interlocutor. The comparison was made via tasks performed by participants using a commercial voice assistant under both conditions in alternation. We compare the distributions of the features across the two conditions to investigate whether speakers behave differently when a confederate is involved. Temporal analysis exposes continuous changes in the feature productions. In particular, we measured overall accommodation between the participants and the system throughout the interactions. Results show significant differences in a majority of cases for two of the three features, which are more pronounced in cases where the user first interacted with the device alone. We also analyze factors such as the task performed, participant gender, and task order, providing additional insight into the participants’ behavior.


 DOI: 10.21437/Interspeech.2019-1825

Cite as: Raveh, E., Siegert, I., Steiner, I., Gessinger, I., Möbius, B. (2019) Three’s a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant. Proc. Interspeech 2019, 4005-4009, DOI: 10.21437/Interspeech.2019-1825.


@inproceedings{Raveh2019,
  author={Eran Raveh and Ingo Siegert and Ingmar Steiner and Iona Gessinger and Bernd Möbius},
  title={{Three’s a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4005--4009},
  doi={10.21437/Interspeech.2019-1825},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1825}
}