14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Lightly Supervised Discriminative Training of Grapheme Models for Improved Sentence-Level Alignment of Speech and Text Data

Adriana Stan (1), Peter Bell (2), Junichi Yamagishi (2), Simon King (2)

(1) Universitatea Tehnică din Cluj-Napoca, Romania
(2) University of Edinburgh, UK

This paper introduces a method for lightly supervised discriminative training using MMI to improve the alignment of speech and text data for use in training HMM-based TTS systems for low-resource languages. In TTS applications, due to the use of long-span contexts, it is important to select training utterances which have wholly correct transcriptions. In a low-resource setting, when using poorly trained grapheme models, we show that the use of MMI discriminative training at the grapheme-level enables us to increase the amount of correctly aligned data by 40%, while maintaining a 7% sentence error rate and 0.8% word error rate. We present the procedure for lightly supervised discriminative training with regard to the objective of minimising sentence error rate.

Full Paper

Bibliographic reference.  Stan, Adriana / Bell, Peter / Yamagishi, Junichi / King, Simon (2013): "Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data", In INTERSPEECH-2013, 1525-1529.