EUROSPEECH 2001 Scandinavia
Large external language resources used for multilingual text processing in TTS systems represent a big problem because of needed space and slow look-up time. Representation of large lexica using finite-state transducers is mainly motivated by considerations of space and time efficiency. In the paper we present a method and results of compiling large German phonetic and morphology lexica (CISLEX)  into corresponding finite-state transducers (FSTs), both with about 300.000 words. For both lexica a great reduction in size and optimal access time was achieved. The starting size for German phonetic lexicon was 12.526 MB and 18.49 MB for morphology lexicon. The final size of the corresponding FST was only 2.78 MB for the phonetic lexicon and 6.33 MB for the morphology lexicon. At the same time the look-up time is optimal, since it depends only on the length of the input word and not on the size of the lexicon.
Bibliographic reference. Rojc, Matej / Kacic, Zdravko (2001): "Representation of large lexica using finite-state transducers for the multilingual text-to-speech synthesis systems", In EUROSPEECH-2001, 2251-2254.