Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction

Antoine Bruguier, Anton Bakhtin, Dravyansh Sharma


Both automatic speech recognition and text to speech systems need accurate pronunciations, typically obtained by using both a lexicon dictionary and a grapheme to phoneme (G2P) model. G2P typically struggle with predicting pronunciations for tail words and we hypothesized that one reason is because they try to discover general pronunciation rules without using prior knowledge of the pronunciation of related words. Our new approach expands a sequence-to-sequence G2P model by injecting prior knowledge. In addition, our model can be updated without having to retrain a system. We show that our new model has significantly better performance for German, both on a tightly controlled task and on our real-world system. Finally, the simplification of the system allows for faster and easier scaling to other languages.


 DOI: 10.21437/Interspeech.2018-2061

Cite as: Bruguier, A., Bakhtin, A., Sharma, D. (2018) Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction. Proc. Interspeech 2018, 3733-3737, DOI: 10.21437/Interspeech.2018-2061.


@inproceedings{Bruguier2018,
  author={Antoine Bruguier and Anton Bakhtin and Dravyansh Sharma},
  title={Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3733--3737},
  doi={10.21437/Interspeech.2018-2061},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2061}
}