8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Segment Deletion in Spontaneous Speech: A Corpus Study Using Mixed Effects Models with Crossed Random Effects

Christophe Van Bael, Harald Baayen, Helmer Strik

Radboud University Nijmegen, The Netherlands

We studied the frequencies of phone and syllable deletions in spontaneous Dutch, and the extent to which such deletions are influenced by the various linguistic and sociolinguistic factors represented in the transcriptions, word segmentations and metadata of the Spoken Dutch Corpus. In addition to providing insight into the frequencies of phone and syllable deletions and the factors influencing them, our study illustrates the new opportunities for analysing rich and therefore complex corpus data offered by a recently developed statistical modelling technique: the possibility to model the effects of random factors as crossed instead of nested with generalised linear mixed effects models.

We observed average phone and syllable deletion rates of 7.57% and 5.46% respectively. 20.32% of the words had at least one phone missing, and 6.89% of the words had at least one syllable deleted. The mixed effects models for phone and syllable deletion had several effects in common, which implies that both types of deletion are to a large extent influenced by the same factors. The strongest factors across both models were lexical stress, word duration and the segmental context of the syllable onset of the following word.

Full Paper

Bibliographic reference.  Bael, Christophe Van / Baayen, Harald / Strik, Helmer (2007): "Segment deletion in spontaneous speech: a corpus study using mixed effects models with crossed random effects", In INTERSPEECH-2007, 2741-2744.