INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Wavelet-Based Speaker Change Detection in Single Channel Speech Data

Michael Wiesenegger, Franz Pernkopf

Graz University of Technology, Austria

Speaker segmentation is the task of finding speaker turns in an audio stream. We propose a metric-based algorithm based on Discrete Wavelet Transform (DWT) features. Principal component analysis (PCA) or linear discriminant analysis (LDA) [1] are further used to reduce the dimensionality of the feature space and remove redundant information. In the experiments our methods referred to as DWT-PCA and DWT-LDA are compared to the DISTBIC algorithm [2] using clean and noisy data of the TIMIT database. Especially, under conditions with strong noise, i.e. -10dB SNR, our DWT-PCA approach is very robust, the false alarm rate (FAR) increases by 2% and the missed detection rate (MDR) stays about the same compared to clean speech, whereas the DISTBIC method fails the FAR and MDR is almost 0% and 100%, respectively. For clean speech DWT-PCA shows an improvement of 30% (relative) for both the FAR and MDR in comparison to the DISTBIC algorithm. DWT-LDA is performing slightly worse than DWT-PCA.

Reference

  1. C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.

Full Paper

Bibliographic reference.  Wiesenegger, Michael / Pernkopf, Franz (2009): "Wavelet-based speaker change detection in single channel speech data", In INTERSPEECH-2009, 836-839.