In the paper we investigate methods suitable for practical implementation in a recognition system that is to classify telephone input in form of isolated words/phrases belonging to large vocabularies with equiprobable entries, such as people names, city and local names, etc. Specifically for Czech language we propose a pronunciation lexicon with a prefix-stem-sufix arrangement combined with appropriate caching and pruning techniques and a 2-level (monophone and triphone) based classification. In experiments done with telephone speech containing items from a 5347-word city-name vocabulary we obtained 90.1 % recognition score in average time 645 ms per word. Acoustic models for these experiments have been trained on an only available multi-speaker database that was originally recorded by a microphone and later transferred over telephone lines and automatically realigned.
Cite as: Nouza, J. (2000) Telephone speech recognition from large lists of Czech words. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 394-397
@inproceedings{nouza00_icslp, author={Jan Nouza}, title={{Telephone speech recognition from large lists of Czech words}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 394-397} }