Auditory-Visual Speech Processing (AVSP) 2010

Hakone, Kanagawa, Japan
September 30-October 3, 2010

"D-o-e-s-Not-C-o-m-p-u-t-e”: Vowel Hyperarticulation in Speech to an Auditory-Visual Avatar

Denis Burnham, Sebastian Joeffry, Lauren Rice

Marcs Auditory Laboratories, University of Western Sydney, Sydney, Australia

Humans use speech to convey information; attract attention; express affect, etc. Speech register research shows that humans are adept at fine-tuning components of their speech to accommodate the needs of their audience, suggesting that they have a model of others’ communication needs. However, when that audience is a computer rather than another human, such a model may be invalid and speech adaptations, Computer-Directed Speech, may be inappropriate. Here we examine humans’ speech to other humans or an auditoryvisual avatar before and after the computer makes a listening “error”. Vowel durations are found to be longer in Computerthan Human-Directed Speech (especially in speech repairs after computer errors), and there is greater vowel hyperarticulation in Computer- than Human-Directed Speech both before and after error correction. The results are discussed in terms of human-computer interaction (HCI), talking head applications and ASR systems.

Index Terms: computer-directed speech, speech repairs, vowel hyperarticulation, human-computer interaction.

Full Paper

Bibliographic reference.  Burnham, Denis / Joeffry, Sebastian / Rice, Lauren (2010): ""d-o-e-s-not-c-o-m-p-u-t-e”: vowel hyperarticulation in speech to an auditory-visual avatar", In AVSP-2010, paper P18.