ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Source-filter separation of speech signal in the phase domain

Erfan Loweimi, Jon Barker, Thomas Hain

Deconvolution of the speech excitation (source) and vocal tract (filter) components through log-magnitude spectral processing is well-established and has led to the well-known cepstral features used in a multitude of speech processing tasks. This paper presents a novel source-filter decomposition based on processing in the phase domain. We show that separation between source and filter in the log-magnitude spectra is far from perfect, leading to loss of vital vocal tract information. It is demonstrated that the same task can be better performed by trend and fluctuation analysis of the phase spectrum of the minimum-phase component of speech, which can be computed via the Hilbert transform. Trend and fluctuation can be separated through low-pass filtering of the phase, using additivity of vocal tract and source in the phase domain. This results in separated signals which have a clear relation to the vocal tract and excitation components. The effectiveness of the method is put to test in a speech recognition task. The vocal tract component extracted in this way is used as the basis of a feature extraction algorithm for speech recognition on the Aurora-2 database. The recognition results shows upto 8.5% absolute improvement in comparison with MFCC features on average (0-20dB).

doi: 10.21437/Interspeech.2015-211

Cite as: Loweimi, E., Barker, J., Hain, T. (2015) Source-filter separation of speech signal in the phase domain. Proc. Interspeech 2015, 598-602, doi: 10.21437/Interspeech.2015-211

  author={Erfan Loweimi and Jon Barker and Thomas Hain},
  title={{Source-filter separation of speech signal in the phase domain}},
  booktitle={Proc. Interspeech 2015},