Most speech features are inherently complex but usually only their magnitude is considered in terms of spectral distortion measures. DFT and cepstral spectra are typical examples of this, where the phase information is usually thought to be of little value and is therefore discarded. This paper describes a new form of neural network that is inherently complex. We propose its use in applications where the input to a pattern recognition task contains complex information, and choose the task of speaker verification. The complex feature we consider here is the DFT of cepstral time series spanning a single utterance. In generating such features we show the effects of sampling rate and aliasing on the 2D mel-cepstra. The role of the non-linearity in the complex network is of paramount importance. We propose functions suitable for this case, since the standard sigmoid is inappropriate. To evaluate this new structure the task of speaker verification is chosen. Preliminary results are promising, supporting the case for the complex net.
Cite as: Andrews, E.C., Mason, J.S. (1991) Neural network classification of complex-valued speech features. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 113-116, doi: 10.21437/Eurospeech.1991-24
@inproceedings{andrews91_eurospeech, author={E. C. Andrews and J. S. Mason}, title={{Neural network classification of complex-valued speech features}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={113--116}, doi={10.21437/Eurospeech.1991-24} }