9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Preparing a Corpus of Dutch Spontaneous Dialogues for Automatic Phonetic Analysis

Barbara Schuppler, Mirjam Ernestus, Odette Scharenborg, Lou Boves

Radboud Universiteit Nijmegen, The Netherlands

This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.

Full Paper

Bibliographic reference.  Schuppler, Barbara / Ernestus, Mirjam / Scharenborg, Odette / Boves, Lou (2008): "Preparing a corpus of dutch spontaneous dialogues for automatic phonetic analysis", In INTERSPEECH-2008, 1638-1641.