ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Extracting word-pronunciation pairs from comparable set of text and speech

Tetsuro Sasada, Shinsuke Mori, Tatsuya Kawahara

One of the problems in text-to-speech (TTS) systems and speech-totext (STT) systems is pronunciation estimation of unknown words. In this paper, we propose a method for extracting unknown words and their pronunciations from similar sets of Japanese text data and speech data. Out-of-vocabulary words are extracted from text with a stochastic model and pronunciations hypotheses are generated. These entries are verified by conducting automatic speech recognition on audio data. In this work, we use news articles and broadcast TV news covering similar topics. Most extracted pairs turned out to be correct according to a human judges. We also tested the TTS front-end enhanced with these entries on other web news articles, and observed an improvement in the pronunciation estimation accuracy of 9.2% (relative). The proposed method can be used to realize a spoken language processing system that acquires and updates its lexicon automatically.


doi: 10.21437/Interspeech.2008-500

Cite as: Sasada, T., Mori, S., Kawahara, T. (2008) Extracting word-pronunciation pairs from comparable set of text and speech. Proc. Interspeech 2008, 1821-1824, doi: 10.21437/Interspeech.2008-500

@inproceedings{sasada08_interspeech,
  author={Tetsuro Sasada and Shinsuke Mori and Tatsuya Kawahara},
  title={{Extracting word-pronunciation pairs from comparable set of text and speech}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1821--1824},
  doi={10.21437/Interspeech.2008-500}
}