One of the problems of Automatic Speech Recognition (ASR) systems dealing with large lexica (more than 10,000 words) is the time needed to extract candidate words from a given input signal. Ideally, one should execute a Viterbi-like algorithm for each word model and pick the best word(s) or, in case of continuous speech, the best sequence(s) of words. However, the practical time complexity of such an operation is too high for large lexica. To overcome this problem, we propose to compress the lexicon into a graph by merging common suffixes and prefixes. For French and English lexica respectively, this yields a 90% and 70% reduction in memory space. A reduction in time complexity was also achieved which allowed us to execute in real time an exact A* search algorithm on the whole graph for isolated words as well as for whole sentences with no need for any fastmatch or other heuristics. Keywords: Large lexicon access, Graph, Lexicon Compression, Lexicon Representation, A* search
Cite as: Lacouture, R., Mori, R.D. (1991) Lexical tree compression. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 581-584, doi: 10.21437/Eurospeech.1991-149
@inproceedings{lacouture91_eurospeech, author={Roxane Lacouture and Renato De Mori}, title={{Lexical tree compression}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={581--584}, doi={10.21437/Eurospeech.1991-149} }