Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Using Partial Morphological Analysis in Language Modeling Estimation for Large Vocabulary Portuguese Speech Recognition

Ciro Martins, Joao P. Neto, Luís B. Almeida

INESC-IST, Lisboa, Portugal

To achieve an acceptable degree of generalization, current speech recognition systems work with large vocabularies, which, among other effects, result in higher search spaces and consequently lower system performance. For highly in ectional languages, such as the Portuguese, a much larger vocabulary is required for the same tasks coverage and a much larger text corpus for extraction of word-based statistics with the same reliability. In this paper we present a new approach using some basic morphological analysis based on the decomposition of regular verbs on its morphemes (roots and suffixes) applied to a Portuguese large vocabulary continuous speech recognition system. This approach not only reduces the vocabulary size and therefore the language model perplexity, but also the rate of out-of-vocabulary words (OOV) and memory requirements. Preliminary results shows an improvement of about 20% on the recognition speed with a slight degradation on the word error rate (WER).


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Martins, Ciro / Neto, Joao P. / Almeida, Luís B. (1999): "Using partial morphological analysis in language modeling estimation for large vocabulary portuguese speech recognition", In EUROSPEECH'99, 1603-1606.