An objective distance measure which is able to predict audible discontinuity in concatenated speech synthesis systems is very important. Previous works were primarily based on features estimated by linear and/or stationary models of speech. In this paper, we introduce two nonlinear approaches for the detection of discontinuity. The first method is based on a nonlinear harmonic model of speech while the second method is based on the demodulation of speech in an amplitude and a frequency component using the Teager energy operator. Fisher's linear discriminant was used for the separation of signals with audible discontinuity from those perceived as continuous. When we combined the two methods using Fisher's linear discriminant a detection rate of 56.5% was achieved which is an 90% improvement over previously published results on the same database.
Cite as: Pantazis, Y., Stylianou, Y., Klabbers, E. (2005) Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis. Proc. Interspeech 2005, 2817-2820, doi: 10.21437/Interspeech.2005-621
@inproceedings{pantazis05_interspeech, author={Yannis Pantazis and Yannis Stylianou and Esther Klabbers}, title={{Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2817--2820}, doi={10.21437/Interspeech.2005-621} }