This paper describes the development of a full-form lexicon, combined with an algorithm for quasi-morphological decomposition aiming at improved grapheme-to-phoneme conversion, word stress assignment, syllabification and word class assignment in a Text-to-Speech system. We will explain the way in which the optimal size of the lexicon was determined. Also, we describe a deterministic algorithm for decomposing words not found in the lexicon in terms of a sequence of lexicon entries and prefixes, suffixes and infixes. The performance of the lexicon+decomposition system is evaluated with a newspaper corpus comprising approximately 100,000 words. It appears that the system handles more than 95% of the regular words in the test corpus correctly. The system will have to be extended with a module that handles proper names.
Cite as: Gulikers, L., Willemse, R. (1992) A lexicon for a text-to-speech system. Proc. 2nd International Conference on Spoken Language Processing (ICSLP 1992), 101-104, doi: 10.21437/ICSLP.1992-28
@inproceedings{gulikers92_icslp, author={Leon Gulikers and Rijk Willemse}, title={{A lexicon for a text-to-speech system}}, year=1992, booktitle={Proc. 2nd International Conference on Spoken Language Processing (ICSLP 1992)}, pages={101--104}, doi={10.21437/ICSLP.1992-28} }