EUROSPEECH 2001 Scandinavia
In this paper, we present our recently developed time-synchronous speech recognition decoder, which adopts the idea of representing the search space of Large Vocabulary Continuous Speech Recognition (LVCSR) in a single precompiled network. In particular, we outline our approaches for time and memory efficient Viterbi decoding in this scenario. This includes reducing the fast memory needs by keeping the search network on disk and only loading the required parts on demand. Evaluations are carried out on a difficult Japanese LVCSR task which involves a back-off trigram language model and full cross-word dependent triphone acoustic models. Time and memory efficiency enables the real-time Viterbi decoding of entire lecture speeches in a single time-synchronous pass with a search error of less than 1%.
Bibliographic reference. Willett, Daniel / McDermott, Erik / Minami, Yasuhiro / Katagiri, Shigeru (2001): "Time and memory efficient viterbi decoding for LVCSR using a precompiled search network", In EUROSPEECH-2001, 847-850.