A bottom-up, stepwise, knowledge integration framework is proposed to realize detection-based, large vocabulary continuous speech recognition (LVCSR) with a weighted finite state machine (WFSM). The WFSM framework offers a flexible architecture for different types of knowledge network compositions, each of them can be built and optimized independently. Speech attribute detectors are used as an intermediate block to obtain phoneme posterior probabilities over which a phoneme recognition network is designed. Lexical access and syntax knowledge integration over this phoneme network are then performed to deliver the decoded sentences. Experimental evidence illustrates that the proposed system outperforms several hybrid HMM/ANN systems with different configurations on the Wall Street Journal task while it is competitive with conventional LVCSR technology.
Bibliographic reference. Siniscalchi, Sabato Marco / Svendsen, Torbjørn / Lee, Chin-Hui (2011): "A bottom-up stepwise knowledge-integration approach to large vocabulary continuous speech recognition using weighted finite state machines", In INTERSPEECH-2011, 901-904.