EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Improved Data-Driven Generation of Pronunciation Dictionaries Using an Adapted Word List

Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann

Dresden University of Technology, Germany

Data-driven approaches to learning pronunciation variants for phonetic dictionaries have to deal with the problem of acquiring a sufficient amount of training data. The reason is not the size of the databases, but the unfavorable distribution of word frequencies in natural speech, which is known as Zipf’s law. In this paper we suggest a method which reorganizes a phonetic dictionary according to a given speech database in order to maximize the number of word models for which pronunciation variants can be learned with this corpus. Reorganization takes place automatically by analyzing the orthographic and phonetic transcriptions of the corpus. The method produces an alternative word list consisting of units ranging from partial words to multi-words. The efficiency and the limits of the approach are discussed on the basis of experiments carried out on the German VERBMOBIL corpus.

Full Paper

Bibliographic reference.  Wolff, Matthias / Eichner, Matthias / Hoffmann, Rüdiger (2001): "Improved data-driven generation of pronunciation dictionaries using an adapted word list", In EUROSPEECH-2001, 1433-1436.