ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data

Adriana Stan, Peter Bell, Junichi Yamagishi, Simon King

This paper introduces a method for lightly supervised discriminative training using MMI to improve the alignment of speech and text data for use in training HMM-based TTS systems for low-resource languages. In TTS applications, due to the use of long-span contexts, it is important to select training utterances which have wholly correct transcriptions. In a low-resource setting, when using poorly trained grapheme models, we show that the use of MMI discriminative training at the grapheme-level enables us to increase the amount of correctly aligned data by 40%, while maintaining a 7% sentence error rate and 0.8% word error rate. We present the procedure for lightly supervised discriminative training with regard to the objective of minimising sentence error rate.


doi: 10.21437/Interspeech.2013-308

Cite as: Stan, A., Bell, P., Yamagishi, J., King, S. (2013) Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data. Proc. Interspeech 2013, 1525-1529, doi: 10.21437/Interspeech.2013-308

@inproceedings{stan13_interspeech,
  author={Adriana Stan and Peter Bell and Junichi Yamagishi and Simon King},
  title={{Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1525--1529},
  doi={10.21437/Interspeech.2013-308}
}