In this contribution we show how to exploit text data to support word discovery from audio input in an underresourced target language. Given audio, of which a certain amount is transcribed at the word level, and additional unrelated text data, the approach is able to learn a probabilistic mapping from acoustic units to characters and utilize it to segment the audio data into words without the need of a pronunciation dictionary. This is achieved by three components: an unsupervised acoustic unit discovery system, a supervisedly trained acoustic unit-to-grapheme converter, and a word discovery system, which is initialized with a language model trained on the text data. Experiments for multiple setups show that the initialization of the language model with text data improves the word segmentation performance by a large margin.
Cite as: Glarner, T., Boenninghoff, B., Walter, O., Haeb-Umbach, R. (2017) Leveraging Text Data for Word Segmentation for Underresourced Languages. Proc. Interspeech 2017, 2143-2147, doi: 10.21437/Interspeech.2017-1262
@inproceedings{glarner17_interspeech, author={Thomas Glarner and Benedikt Boenninghoff and Oliver Walter and Reinhold Haeb-Umbach}, title={{Leveraging Text Data for Word Segmentation for Underresourced Languages}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2143--2147}, doi={10.21437/Interspeech.2017-1262} }