Speech Synthesis for Mixed-Language Navigation Instructions

Khyathi Raghavi Chandu, SaiKrishna Rallabandi, Sunayana Sitaram, Alan W. Black


Text-to-Speech (TTS) systems that can read navigation instructions are one of the most widely used speech interfaces today. Text in the navigation domain may contain named entities such as location names that are not in the language that the TTS database is recorded in. Moreover, named entities can be compound words where individual lexical items belong to different languages. These named entities may be transliterated into the script that the TTS system is trained on. This may result in incorrect pronunciation rules being used for such words. We describe experiments to extend our previous work in generating code-mixed speech to synthesize navigation instructions, with a mixed-lingual TTS system. We conduct subjective listening tests with two sets of users, one being students who are native speakers of an Indian language and very proficient in English, and the other being drivers with low English literacy, but familiarity with location names. We find that in both sets of users, there is a significant preference for our proposed system over a baseline system that synthesizes instructions in English.


 DOI: 10.21437/Interspeech.2017-1259

Cite as: Chandu, K.R., Rallabandi, S., Sitaram, S., Black, A.W. (2017) Speech Synthesis for Mixed-Language Navigation Instructions. Proc. Interspeech 2017, 57-61, DOI: 10.21437/Interspeech.2017-1259.


@inproceedings{Chandu2017,
  author={Khyathi Raghavi Chandu and SaiKrishna Rallabandi and Sunayana Sitaram and Alan W. Black},
  title={Speech Synthesis for Mixed-Language Navigation Instructions},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={57--61},
  doi={10.21437/Interspeech.2017-1259},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1259}
}