ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Improved data-driven generation of pronunciation dictionaries using an adapted word list

Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann

Data-driven approaches to learning pronunciation variants for phonetic dictionaries have to deal with the problem of acquiring a sufficient amount of training data. The reason is not the size of the databases, but the unfavorable distribution of word frequencies in natural speech, which is known as Zipf’s law. In this paper we suggest a method which reorganizes a phonetic dictionary according to a given speech database in order to maximize the number of word models for which pronunciation variants can be learned with this corpus. Reorganization takes place automatically by analyzing the orthographic and phonetic transcriptions of the corpus. The method produces an alternative word list consisting of units ranging from partial words to multi-words. The efficiency and the limits of the approach are discussed on the basis of experiments carried out on the German VERBMOBIL corpus.


doi: 10.21437/Eurospeech.2001-22

Cite as: Wolff, M., Eichner, M., Hoffmann, R. (2001) Improved data-driven generation of pronunciation dictionaries using an adapted word list. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1433-1436, doi: 10.21437/Eurospeech.2001-22

@inproceedings{wolff01_eurospeech,
  author={Matthias Wolff and Matthias Eichner and Rüdiger Hoffmann},
  title={{Improved data-driven generation of pronunciation dictionaries using an adapted word list}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={1433--1436},
  doi={10.21437/Eurospeech.2001-22}
}