8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Investigating Speech Style Specific Pronunciation Variation in Large Spoken Language Corpora

Christophe Van Bael, Henk van den Heuvel, Helmer Strik

Radboud University Nijmegen, Netherlands

In the past, linguistic research was typically conducted on relatively small datasets that were specifically designed for the research at hand. Whereas to date many large spoken language corpora have become available, the usefulness of these corpora is still not fully established in linguistic research. The research reported on in this paper was conducted to illustrate the potential of large multi-purpose spoken language corpora for linguistic research. The possibility was investigated of identifying phonetic regularities in different speech styles. To this end, a data-driven study was conducted with a large multi-purpose spoken language corpus comprising a manually corrected broad phonetic transcription of the data. Our results show that speech style specific pronunciation processes can indeed be found in such a large corpus. This indicates that large multi-purpose spoken language corpora can contribute to linguistic research, if only for the purpose of hypothesis generation and verification.

