The emergence of highly parallel computing platforms is enabling new trade-offs in algorithm design for automatic speech recognition. It naturally motivates the following investigation: Do the most computationally efficient sequential algorithms lead to the most computationally efficient parallel algorithms? In this paper we explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST). We demonstrate that while an inference engine using the simpler LLM representation evaluates 22x more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4x faster evaluation and 53-65x faster operand-gathering for each state transition. We use the 5k Wall Street Journal Corpus to experiment on the NVIDIA GTX480 (Fermi) and the NVIDIA GTX285 Graphics Processing Units (GPUs), and illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel implementation platforms.
Bibliographic reference. Chong, Jike / Gonina, Ekaterina / You, Kisun / Keutzer, Kurt (2010): "Exploring recognition network representations for efficient speech inference on highly parallel platforms", In INTERSPEECH-2010, 1489-1492.