The Automatic Speech Recognition (ASR) - the process of converting spoken words to computer-intelligible information, also known as speech-to-text, or speech recognition - is used for control of various devices (i.e. car, computer, mobile phone, etc.). The most popular algorithms implemented in these architectures are based on statistical methods - Hidden Markov Models (HMM). There is a hierarchy, which sorts elementary stages of speech recognition. Elementary levels can be represented by weighted finite state transducers. Thus by using some FSM toolkit (i.e. AT&T FSMtoolkit), we are able to obtain a common method how to optimize this automata and compose into the recognition network. Main usage of the FSM toolkit for the Czech language has not been used by now. This work explores feasibility of AT&T FSM toolkit used together with HTK for the Czech language and compares results in the speed of recognizers based on the FSM and results obtained using just HTK toolkit.
Cite as: Stemberk, P., Hanzl, V. (2005) Finite-state transducer toolkit for faster ASR. Proc. Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005), paper 21
@inproceedings{stemberk05_aside, author={Pavel Stemberk and Václav Hanzl}, title={{Finite-state transducer toolkit for faster ASR}}, year=2005, booktitle={Proc. Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005)}, pages={paper 21} }