Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili

Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King


When a text-to-speech (TTS) system is required to speak world news, a large fraction of the words to be spoken will be proper names originating in a wide variety of languages. Phonetization of these names based on target language letter-to-sound rules will typically be inadequate. This is detrimental not only during synthesis, when inappropriate phone sequences are produced, but also during training, if the system is trained on data from the same domain. This is because poor phonetization during forced alignment based on hidden Markov models can pollute the whole model set, resulting in degraded alignment even of normal target-language words. This paper presents four techniques designed to address this issue in the context of a Swahili TTS system: automatic transcription of proper names based on a lexicon from a better-resourced language; the addition of a parallel phone set and special part-of-speech tag exclusively dedicated to proper names; a manually-crafted phone mapping which allows substitutions for potentially more accurate phones in proper names during forced alignment; the addition in proper names of a grapheme-derived frame-level feature, supplementing the standard phonetic inputs to the acoustic model. We present results from objective and subjective evaluations of systems built using these four techniques.


 DOI: 10.21437/Interspeech.2017-1398

Cite as: Mendelson, J., Oplustil, P., Watts, O., King, S. (2017) Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili. Proc. Interspeech 2017, 2188-2192, DOI: 10.21437/Interspeech.2017-1398.


@inproceedings{Mendelson2017,
  author={Joseph Mendelson and Pilar Oplustil and Oliver Watts and Simon King},
  title={Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2188--2192},
  doi={10.21437/Interspeech.2017-1398},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1398}
}