Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon crosslingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languagesEnglish, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography
Bibliographic reference. Sitaram, Sunayana / Anumanchipalli, Gopala Krishna / Chiu, Justin / Parlikar, Alok / Black, Alan W. (2013): "Text to speech in new languages without a standardized orthography", In SSW8, 95-100.