11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Parallel Training of Neural Networks for Speech Recognition

Karel Veselý, Lukáš Burget, František Grézl

Brno University of Technology, Czech Republic

In this paper we describe parallel implementation of ANN training procedure based on block mode back-propagation learning algorithm. Two different approaches to training parallelization were implemented. The first is data parallelization using POSIX threads, it is suitable for multi-core computers. The second is node parallelization using high performance SIMD architecture of GPU with CUDA, suitable for CUDA enabled computers. We compare the speedup of both approaches by learning typically-sized network on the real-world phoneme-state classification task, showing nearly 10 times reduction when using CUDA version, while the 8-core server with multi-thread version gives only 4 times reduction. In both cases we compared to an already BLAS optimized implementation. The training tool will be released as Open-Source software under project name TNet.

Full Paper

Bibliographic reference.  Veselý, Karel / Burget, Lukáš / Grézl, František (2010): "Parallel training of neural networks for speech recognition", In INTERSPEECH-2010, 2934-2937.