ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Redundancy and productivity in the speech technology lexicon - can we do better?

Susan Fitt, Korin Richmond

Current lexica for speech technology typically contain much redundancy, while omitting useful information. A comparison with lexica in other media and for other purposes is instructive, as it highlights some features we may borrow for text-to-speech and speech recognition lexica.

We describe some aspects of the new lexicon we are producing, Combilex, whose structure and implementation is specifically designed to reduce redundancy and improve the representation of productive elements of English. Most importantly, many English words are predictable derivations of baseforms, or compounds. Storing the lexicon as a combination of baseforms and derivational rules speeds up lexicon development, and improves coverage and maintainability.


doi: 10.21437/Interspeech.2006-42

Cite as: Fitt, S., Richmond, K. (2006) Redundancy and productivity in the speech technology lexicon - can we do better? Proc. Interspeech 2006, paper 1202-1CaP.4, doi: 10.21437/Interspeech.2006-42

@inproceedings{fitt06_interspeech,
  author={Susan Fitt and Korin Richmond},
  title={{Redundancy and productivity in the speech technology lexicon - can we do better?}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1202-1CaP.4},
  doi={10.21437/Interspeech.2006-42}
}