ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition

Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ebru Arsoy, Murat Saraclar

Within the EU Network of Excellence PASCAL, a challenge was organized to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. Twelve research groups participated in the challenge and had submitted segmentation results obtained by their algorithms. In this paper, we evaluate the application of these segmentation algorithms to large vocabulary speech recognition using statistical n-gram language models based on the proposed word segments instead of entire words. Experiments were done for two agglutinative and morphologically rich languages: Finnish and Turkish. We also investigate combining various segmentations to improve the performance of the recognizer.


doi: 10.21437/Interspeech.2006-330

Cite as: Kurimo, M., Creutz, M., Varjokallio, M., Arsoy, E., Saraclar, M. (2006) Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition. Proc. Interspeech 2006, paper 1512-Tue2A2O.1, doi: 10.21437/Interspeech.2006-330

@inproceedings{kurimo06_interspeech,
  author={Mikko Kurimo and Mathias Creutz and Matti Varjokallio and Ebru Arsoy and Murat Saraclar},
  title={{Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1512-Tue2A2O.1},
  doi={10.21437/Interspeech.2006-330}
}