Conversation partners have a tendency to adapt their vocal intensity to each other and to other social and environmental factors. A socially adequate vocal intensity level by a speech synthesiser that goes beyond mere volume adjustment is highly desirable for a rewarding and successful human-machine or machine mediated human-human interaction. This paper examines the interaction of the Lombard effect and speaker entrainment in a controlled experiment conducted with a confederate interlocutor. The interlocutor was asked to maintain either a soft, a modal or a loud voice level during the dialogues. Through half of the trials, subjects were exposed to a cocktail party noise through headphones. The analytical results suggest that both the background noise and the interlocutor's voice level affect the dynamics of speaker entrainment. Speakers appear to still entrain to the voice level of their interlocutor in noisy conditions, though to a lesser extent, as strategies of ensuring intelligibility affect voice levels as well. These findings could be leveraged in spoken dialogue systems and speech generating devices to help choose a vocal effort level for the synthetic voice that is both intelligible and socially suited to a specific interaction.
Bibliographic reference. Székely, Éva / Keane, Mark T. / Carson-Berndsen, Julie (2015): "The effect of soft, modal and loud voice levels on entrainment in noisy conditions", In INTERSPEECH-2015, 150-154.