Interspeech'2005 - Eurospeech
In this contribution we examine large speech corpora of prepared broadcast and spontaneous telephone speech in American English and in French. Starting with the question whether ASR systems behave differently on male and female speech, we then try to find evidence on acoustic-phonetic, lexical and idiomatic levels to explain the observed differences. Recognition results have been analysed on 3-7h of speech in each language and speech type condition (totaling 20 hours). Results consistently show a lower word error rate on female speech ranging from 0.7 to 7% depending on the condition. An analysis of automatically produced pronunciations in speech training corpora (totaling 4000 hours of speech) revealed that female speakers tend to stick more consistently to standard pronunciations than male speakers. Concerning speech disfluencies, male speakers show larger proportions of filled pauses and repetitions, as compared to females.
Bibliographic reference. Adda-Decker, Martine / Lamel, Lori (2005): "Do speech recognizers prefer female speakers?", In INTERSPEECH-2005, 2205-2208.