One well-known problem with diphone concatenation is the occurrence of audible discontinuities at diphone boundaries, which are most prominent in vowels and semi-vowels. Significant formant jumps at certain boundaries suggest that the problem is of a spectral nature. We have examined this hypothesis by correlating the results of a listening experiment with spectral distances measured across diphone boundaries. The aim is to find a spectral distance measure that best predicts when discontinuities are audible in order to find out how the diphone database can best be extended with context-sensitive diphones. The results show that the Kullback-Leibler measure is the best predictor.
Cite as: Klabbers, E., Veldhuis, R. (1998) On the reduction of concatenation artefacts in diphone synthesis. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0115, doi: 10.21437/ICSLP.1998-31
@inproceedings{klabbers98_icslp, author={Esther Klabbers and Raymond Veldhuis}, title={{On the reduction of concatenation artefacts in diphone synthesis}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0115}, doi={10.21437/ICSLP.1998-31} }