Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)

St. Petersburg, Russia
May 14-16, 2014

Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

Tim Schlippe, Wolf Quaschningk, Tanja Schultz

Cognitive Systems Lab, Karlsruhe Institute of Technology (KIT), Germany

For pronunciation dictionary creation, we propose the combination of grapheme-to-phoneme (G2P) converter outputs where low resources are available to train the single converters. Our experiments with German, English, French, and Spanish show that in most cases the phoneme-level combination approaches validated reference pronunciations more than the single converters. In case of only little training data, the impact of the fusion is high which shows their great importance for under-resourced languages. We detected that the output of G2P converters built with web-derived wordpronunciation pairs can further improve pronunciation quality. With 23.1% relative in terms of phoneme error rate to the reference dictionary, we report the largest improvement for the scenario where only 200 French word-pronunciation pairs and web data are given as training data. In additional automatic speech recognition experiments we show that the resulting dictionaries can lead to performance improvements.

Index Terms: pronunciation dictionary, pronunciation modeling, low-resource scenarios, multilingual speech recognition, rapid language adaptation

Full Paper

Bibliographic reference.  Schlippe, Tim / Quaschningk, Wolf / Schultz, Tanja (2014): "Combining grapheme-to-phoneme converter outputs for enhanced pronunciation generation in low-resource scenarios", In SLTU-2014, 139-145.