This paper introduces a method for lightly supervised discriminative training using MMI to improve the alignment of speech and text data for use in training HMM-based TTS systems for low-resource languages. In TTS applications, due to the use of long-span contexts, it is important to select training utterances which have wholly correct transcriptions. In a low-resource setting, when using poorly trained grapheme models, we show that the use of MMI discriminative training at the grapheme-level enables us to increase the amount of correctly aligned data by 40%, while maintaining a 7% sentence error rate and 0.8% word error rate. We present the procedure for lightly supervised discriminative training with regard to the objective of minimising sentence error rate.
Bibliographic reference. Stan, Adriana / Bell, Peter / Yamagishi, Junichi / King, Simon (2013): "Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data", In INTERSPEECH-2013, 1525-1529.