ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Morfessor and variKN machine learning tools for speech and language technology

Vesa Siivola, Mathias Creutz, Mikko Kurimo

This paper introduces two recent open source software packages developed for unsupervised natural language modeling. The Morfessor program segments words automatically into morpheme-like units without any rule-based morphological analyzers. The VariKN toolkit trains language models producing a compact set of high-order n-grams utilizing state-of-art Kneser-Ney smoothing. As an example, this paper shows how to construct a language model for speech recognition in multiple languages utilizing only a minimal amount of linguistic resources. Morfessor and VariKN also have other applications in text understanding, information retrieval and machine translation. Unsupervised machine learning techniques are particularly well suited for the development of systems for less-resourced languages, because they do not depend on manually designed morphological or syntactical analyzers or annotated data.


doi: 10.21437/Interspeech.2007-446

Cite as: Siivola, V., Creutz, M., Kurimo, M. (2007) Morfessor and variKN machine learning tools for speech and language technology. Proc. Interspeech 2007, 1549-1552, doi: 10.21437/Interspeech.2007-446

@inproceedings{siivola07_interspeech,
  author={Vesa Siivola and Mathias Creutz and Mikko Kurimo},
  title={{Morfessor and variKN machine learning tools for speech and language technology}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={1549--1552},
  doi={10.21437/Interspeech.2007-446}
}