Binary Deep Neural Networks for Speech Recognition

Xu Xiang, Yanmin Qian, Kai Yu


Deep neural networks (DNNs) are widely used in most current automatic speech recognition (ASR) systems. To guarantee good recognition performance, DNNs usually require significant computational resources, which limits their application to low-power devices. Thus, it is appealing to reduce the computational cost while keeping the accuracy. In this work, in light of the success in image recognition, binary DNNs are utilized in speech recognition, which can achieve competitive performance and substantial speed up. To our knowledge, this is the first time that binary DNNs have been used in speech recognition. For binary DNNs, network weights and activations are constrained to be binary values, which enables faster matrix multiplication based on bit operations. By exploiting the hardware population count instructions, the proposed binary matrix multiplication can achieve 5~7 times speed up compared with highly optimized floating-point matrix multiplication. This results in much faster DNN inference since matrix multiplication is the most computationally expensive operation. Experiments on both TIMIT phone recognition and a 50-hour Switchboard speech recognition show that, binary DNNs can run about 4 times faster than standard DNNs during inference, with roughly 10.0% relative accuracy reduction.


 DOI: 10.21437/Interspeech.2017-1343

Cite as: Xiang, X., Qian, Y., Yu, K. (2017) Binary Deep Neural Networks for Speech Recognition. Proc. Interspeech 2017, 533-537, DOI: 10.21437/Interspeech.2017-1343.


@inproceedings{Xiang2017,
  author={Xu Xiang and Yanmin Qian and Kai Yu},
  title={Binary Deep Neural Networks for Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={533--537},
  doi={10.21437/Interspeech.2017-1343},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1343}
}