The level of quality that can be achieved in concatenative text-to-speech synthesis depends, among other things, on a judicious segmentation of all units in the underlying unit selection inventory. We have recently advocated the iterative refinement of unit boundaries based on a data-driven feature extraction framework separately optimized for each boundary region [1]. This paper presents the formal proof of convergence of the iterative algorithm, as well as a detailed analysis of its potential benefits for concatenative TTS synthesis. A formal listening test, in particular, underscores the practical viability of the approach for unit boundary optimization.
Cite as: Bellegarda, J.R. (2006) Further developments in LSM-based boundary training for unit selection TTS. Proc. Interspeech 2006, paper 1142-Tue3BuP.7, doi: 10.21437/Interspeech.2006-387
@inproceedings{bellegarda06_interspeech, author={Jerome R. Bellegarda}, title={{Further developments in LSM-based boundary training for unit selection TTS}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1142-Tue3BuP.7}, doi={10.21437/Interspeech.2006-387} }