Given a time function of plausibilities of observing each phoneme symbol, continuous speech recognition can be formulated as finding the sentence which maximizes the sum of plausibilities of individual symbols of which the sentence is made up. In making use of peaks of plausibility time functions, the maximization can be solved by embedded search processes, via dynamic programmings: finding the best path connecting peaks in the- plausibility functions of two successive symbols, and finding the best time transition slot index for two given peaks. Not searching slot by slot, the resulting algorithm is highly efficient. Based on this principle, the Vinics continuous speech recognition system (1200 words, 1500 grammar rules) has achieved a 95% word recognition rate on speaker-dependent test, using 17 machine labeled training utterances for each speaker and a non-linear vectorial interpolation technique for phoneme plausibility estimation. The search algorithms, system configuration and experimental results are described. Keywords: continuous speech recognition, phoneme plausibility, search algorithm
Cite as: Gong, Y., Haton, J.-P. (1991) VINICS: a continuous speech recognizer based on a new robust formulation. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1221-1224, doi: 10.21437/Eurospeech.1991-179
@inproceedings{gong91b_eurospeech, author={Yifan Gong and Jean-Paul Haton}, title={{VINICS: a continuous speech recognizer based on a new robust formulation}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={1221--1224}, doi={10.21437/Eurospeech.1991-179} }