On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement

Lukas Drude, Bhiksha Raj, Reinhold Haeb-Umbach

Although complex-valued neural networks (CVNNs) — networks which can operate with complex arithmetic — have been around for a while, they have not been given reconsideration since the breakthrough of deep network architectures. This paper presents a critical assessment whether the novel tool set of deep neural networks (DNNs) should be extended to complex-valued arithmetic. Indeed, with DNNs making inroads in speech enhancement tasks, the use of complex-valued input data, specifically the short-time Fourier transform coefficients, is an obvious consideration. In particular when it comes to performing tasks that heavily rely on phase information, such as acoustic beamforming, complex-valued algorithms are omnipresent. In this contribution we recapitulate backpropagation in CVNNs, develop complex-valued network elements, such as the split-rectified non-linearity, and compare real- and complex-valued networks on a beamforming task. We find that CVNNs hardly provide a performance gain and conclude that the effort of developing the complex-valued counterparts of the building blocks of modern deep or recurrent neural networks can hardly be justified.

DOI: 10.21437/Interspeech.2016-300

Cite as

Drude, L., Raj, B., Haeb-Umbach, R. (2016) On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement. Proc. Interspeech 2016, 1745-1749.

author={Lukas Drude and Bhiksha Raj and Reinhold Haeb-Umbach},
title={On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement},
booktitle={Interspeech 2016},