Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction

Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran


Recently, we presented a multichannel neural network model trained to perform speech enhancement jointly with acoustic modeling [1], directly from raw waveform input signals. While this model achieved over a 10% relative improvement compared to a single channel model, it came at a large cost in computational complexity, particularly in the convolutions used to implement a time-domain filterbank. In this paper we present several different approaches to reduce the complexity of this model by reducing the stride of the convolution operation and by implementing filters in the frequency domain. These optimizations reduce the computational complexity of the model by a factor of 3 with no loss in accuracy on a 2,000 hour Voice Search task.


DOI: 10.21437/Interspeech.2016-92

Cite as

Sainath, T.N., Narayanan, A., Weiss, R.J., Variani, E., Wilson, K.W., Bacchiani, M., Shafran, I. (2016) Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction. Proc. Interspeech 2016, 1971-1975.

Bibtex
@inproceedings{Sainath+2016,
author={Tara N. Sainath and Arun Narayanan and Ron J. Weiss and Ehsan Variani and Kevin W. Wilson and Michiel Bacchiani and Izhak Shafran},
title={Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-92},
url={http://dx.doi.org/10.21437/Interspeech.2016-92},
pages={1971--1975}
}