11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Exploring Recognition Network Representations for Efficient Speech Inference on Highly Parallel Platforms

Jike Chong (1), Ekaterina Gonina (1), Kisun You (2), Kurt Keutzer (1)

(1) University of California at Berkeley, USA
(2) Seoul National University, Korea

The emergence of highly parallel computing platforms is enabling new trade-offs in algorithm design for automatic speech recognition. It naturally motivates the following investigation: Do the most computationally efficient sequential algorithms lead to the most computationally efficient parallel algorithms? In this paper we explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST). We demonstrate that while an inference engine using the simpler LLM representation evaluates 22x more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4x faster evaluation and 53-65x faster operand-gathering for each state transition. We use the 5k Wall Street Journal Corpus to experiment on the NVIDIA GTX480 (Fermi) and the NVIDIA GTX285 Graphics Processing Units (GPUs), and illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel implementation platforms.

Full Paper

Bibliographic reference.  Chong, Jike / Gonina, Ekaterina / You, Kisun / Keutzer, Kurt (2010): "Exploring recognition network representations for efficient speech inference on highly parallel platforms", In INTERSPEECH-2010, 1489-1492.