Open Source Speech and Language Resources for Frisian

Emre Yılmaz, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David Van Leeuwen


In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech database containing radio broadcasts, a phonetic lexicon with more than 70k words and a language model trained on a text corpus with more than 38M words. With this contribution, we aim to share the Frisian resources we have collected in the scope of the FAME! project, in which a spoken document retrieval system is built for the disclosure of the regional broadcaster’s radio archives. These resources enable research on code-switching and longitudinal speech and language change. Moreover, a sample automatic speech recognition (ASR) recipe for the Kaldi toolkit will also be provided online to facilitate the Frisian ASR research.


DOI: 10.21437/Interspeech.2016-48

Cite as

Yılmaz, E., Heuvel, H.v.d., Dijkstra, J., Velde, H.V.d., Kampstra, F., Algra, J., Leeuwen, D.V. (2016) Open Source Speech and Language Resources for Frisian. Proc. Interspeech 2016, 1536-1540.

Bibtex
@inproceedings{Yılmaz+2016,
author={Emre Yılmaz and Henk van den Heuvel and Jelske Dijkstra and Hans Van de Velde and Frederik Kampstra and Jouke Algra and David Van Leeuwen},
title={Open Source Speech and Language Resources for Frisian},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-48},
url={http://dx.doi.org/10.21437/Interspeech.2016-48},
pages={1536--1540}
}