ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Wavelet-based speaker change detection in single channel speech data

Michael Wiesenegger, Franz Pernkopf

Speaker segmentation is the task of finding speaker turns in an audio stream. We propose a metric-based algorithm based on Discrete Wavelet Transform (DWT) features. Principal component analysis (PCA) or linear discriminant analysis (LDA) [1] are further used to reduce the dimensionality of the feature space and remove redundant information. In the experiments our methods referred to as DWT-PCA and DWT-LDA are compared to the DISTBIC algorithm [2] using clean and noisy data of the TIMIT database. Especially, under conditions with strong noise, i.e. -10dB SNR, our DWT-PCA approach is very robust, the false alarm rate (FAR) increases by ¡«2% and the missed detection rate (MDR) stays about the same compared to clean speech, whereas the DISTBIC method fails ¡ª the FAR and MDR is almost ¡«0% and ¡«100%, respectively. For clean speech DWT-PCA shows an improvement of ¡«30% (relative) for both the FAR and MDR in comparison to the DISTBIC algorithm. DWT-LDA is performing slightly worse than DWT-PCA.

C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.

doi: 10.21437/Interspeech.2009-255

Cite as: Wiesenegger, M., Pernkopf, F. (2009) Wavelet-based speaker change detection in single channel speech data. Proc. Interspeech 2009, 836-839, doi: 10.21437/Interspeech.2009-255

  author={Michael Wiesenegger and Franz Pernkopf},
  title={{Wavelet-based speaker change detection in single channel speech data}},
  booktitle={Proc. Interspeech 2009},