INTERSPEECH 2006 - ICSLP
Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Using Latent Semantic Indexing for Morph-Based Spoken Document Retrieval

Ville T. Turunen, Mikko Kurimo

Helsinki University of Technology, Finland

Previously, phone-based and word-based approaches have been used for spoken document retrieval. The former suffers from high error rates and the latter from limited vocabulary of the recognizer. Our method relies on unlimited vocabulary continuous speech recognizer that uses morpheme-like units discovered in an unsupervised manner. The morpheme-like units, or "morphs" for short, have been successfully used also as index terms. One problem using morphs as index terms is that the segmentation does not always separate the same stem for different inflected forms of the same word. This resembles the problem of synonyms. In this paper, we apply latent semantic indexing to morph based retrieval. The idea is to project morphs that correspond to the same word, as well as other semantically related terms, to the same dimension. The results show clear improvements in Finnish spoken document retrieval performance.

Full Paper

Bibliographic reference.  Turunen, Ville T. / Kurimo, Mikko (2006): "Using latent semantic indexing for morph-based spoken document retrieval", In INTERSPEECH-2006, paper 1220-Mon2WeO.6.