Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Languages

Savitha Murthy, Dinkar Sitaram, Sunayana Sitaram


Out-of-Vocabulary (OOV) detection and recovery is an important aspect of reducing Word Error Rate (WER) in Automatic Speech Recognition (ASR). In this paper, we evaluate the effect on WER for a low-resource language ASR system using OOV detection and recovery. We use a small seed corpus of continuous speech and improve the vocabulary by incorporating the detected OOV words. We use a syllable-model to detect and learn OOV words and, augment the word-model with these words leading to improved recognition. Our research investigates the effect on OOV detection and recovery after adding missing syllable sounds in the syllable model using a Text-to-Speech (TTS) system. Our experiments are conducted using 5 hours of continuous speech Kannada corpus. We use an already available Festival TTS for Hindi to generate Kannada speech. Our initial experiments report an improvement in OOV detection due to addition of missing syllable sounds using a cross-lingual TTS system.


 DOI: 10.21437/Interspeech.2018-1555

Cite as: Murthy, S., Sitaram, D., Sitaram, S. (2018) Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Languages. Proc. Interspeech 2018, 1026-1030, DOI: 10.21437/Interspeech.2018-1555.


@inproceedings{Murthy2018,
  author={Savitha Murthy and Dinkar Sitaram and Sunayana Sitaram},
  title={Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Languages},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1026--1030},
  doi={10.21437/Interspeech.2018-1555},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1555}
}