12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Comparing Syllable Frequencies in Corpora of Written and Spoken Language

Barbara Samlowski (1), Bernd Möbius (2), Petra Wagner (3)

(1) Universität Bonn, Germany
(2) Universität des Saarlandes, Germany
(3) Universität Bielefeld, Germany

In this study, various German language corpora were compared in order to discover the extent to which syllable frequencies remain stable across different contexts and modalities. Although considerable differences in relative frequency were found among the more common syllables, rank numbers proved to be more robust. Variation across corpora was mostly due to vocabulary characteristics of particular corpus domains rather than to systematic differences between spoken and written language. The results indicate that syllable frequencies in written corpora can be taken as a rough estimate for their frequency in spoken language.

