INTERSPEECH 2010
11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Wiktionary as a Source for Automatic Pronunciation Extraction

Tim Schlippe, Sebastian Ochs, Tanja Schultz

Cognitive Systems Lab, Karlsruhe Institute of Technology (KIT), Germany

In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process. As a representative dictionary, we selected Wiktionary [1] since it is at hand in multiple languages and, in addition to the definitions of the words, many phonetic notations in terms of the International Phonetic Alphabet (IPA) are available. Given word lists in four languages English, French, German, and Spanish, we calculated the percentage of words with phonetic notations in Wiktionary. Furthermore, two quality checks were performed: First, we compared pronunciations from Wiktionary to pronunciations from dictionaries based on the GlobalPhone project, which had been created in a rule-based fashion and were manually cross-checked [2]. Second, we analyzed the impact of Wiktionary pronunciations on automatic speech recognition (ASR) systems. French Wiktionary achieved the best pronunciation coverage, containing 92.58% phonetic notations for the French GlobalPhone word list as well as 76.12% and 30.16% for country and international city names. In our ASR systems evaluation, the Spanish system gained the most improvement from Wiktionary pronunciations with 7.22% relative word error rate reduction.


References
  1. “Wiktionary - a wiki-based open content dictionary.” [Online]. Available: http://www.wiktionary.org
  2. Tanja Schultz. GlobalPhone: A Multilingual Speech and Text Database developed at Karlsruhe University. In: Proc. ICSLP Denver, CO, 2002.

Full Paper

Bibliographic reference.  Schlippe, Tim / Ochs, Sebastian / Schultz, Tanja (2010): "Wiktionary as a source for automatic pronunciation extraction", In INTERSPEECH-2010, 2290-2293.