ISCA Archive SLaTE 2019
Teaching American English pronunciation using a TTS service

Jorge Proença, Ganna Raboshchuk, Ângela Costa, Paula Lopez-Otero, Xavier Anguera

In computer-assisted language learning (CALL) applications students are able to learn/improve a language using automated tools. CALL applications benefit from having spoken examples by native language speakers in order to teach pronunciation. Realistically, this is limited to the pre-defined curricula that the application is teaching. In this work we allow the learner to practice pronunciation on freely input text, where the reference audio is generated using a text-to-speech (TTS) system. Instead of building a TTS system from scratch, we use a high quality external service (Amazon Polly TTS). In order to successfully use Amazon Polly as a reference for teaching pronunciation, we carefully control the input text normalization and expansion steps and use the visemes information returned by Polly to select the best phonetic transcription out of all the possible transcriptions computed from the text. We show the usefulness of the approach by comparing the pronunciation scores obtained by a native speaker reading some test sentences to scores from the TTS audio on the same sentences. These show that the TTS audio reaches a similar pronunciation score as real audio, and therefore we conclude that it can be used as a reference for pronunciation learning. We also discuss and address issues of transcription and audio mismatch.

