Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Multiword Expressions in Spontaneous Speech: Do We Really Speak Like That?

Helmer Strik, Diana Binnenpoorte, Catia Cucchiarini

Radboud Universiteit Nijmegen, The Netherlands

In this study, we examined the pronunciation characteristics of multiword expressions (MWEs). We first drew up an inventory of frequently occurring N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. For about 10% of these N-grams phonetic transcriptions were available, which were examined. Our results show that the pronunciation of these N-grams differed to a large extent from the canonical form. In order to determine whether this is a general characteristic of spontaneous speech or rather the effect of the specific status of these N-grams, we analyzed the pronunciations of the individual words composing the N-grams in two context conditions: 1) in the N-gram context and 2) in any other context. We found that words in N-grams do indeed have peculiar pronunciation patterns. This seems to suggest that these N-grams may be considered as MWEs that should therefore be treated as lexical entries with their own specific pronunciation variants in the pronunciation lexicons used for e.g. automatic speech recognition (ASR) and automatic phonetic transcription (APT).

