In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process. As a representative dictionary, we selected Wiktionary  since it is at hand in multiple languages and, in addition to the definitions of the words, many phonetic notations in terms of the International Phonetic Alphabet (IPA) are available. Given word lists in four languages English, French, German, and Spanish, we calculated the percentage of words with phonetic notations in Wiktionary. Furthermore, two quality checks were performed: First, we compared pronunciations from Wiktionary to pronunciations from dictionaries based on the GlobalPhone project, which had been created in a rule-based fashion and were manually cross-checked . Second, we analyzed the impact of Wiktionary pronunciations on automatic speech recognition (ASR) systems. French Wiktionary achieved the best pronunciation coverage, containing 92.58% phonetic notations for the French GlobalPhone word list as well as 76.12% and 30.16% for country and international city names. In our ASR systems evaluation, the Spanish system gained the most improvement from Wiktionary pronunciations with 7.22% relative word error rate reduction.
Bibliographic reference. Schlippe, Tim / Ochs, Sebastian / Schultz, Tanja (2010): "Wiktionary as a source for automatic pronunciation extraction", In INTERSPEECH-2010, 2290-2293.