The best voices in text-to-speech synthesis are currently obtained via acoustic units concatenation-based systems. In such systems, the choice of units whose concatenations will produce an acoustic message is a crucial stage. Moreover, it can be observed that current TTS systems use acoustic units which most often correspond to variable-length phonetic descriptions. In this article, an original framework is proposed which allows the automatic determination of an optimum set of variable-length acoustic units.
Cite as: Boeffard, O. (2001) Variable-length acoustic units inference for text-to-speech synthesis. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 983-986, doi: 10.21437/Eurospeech.2001-261
@inproceedings{boeffard01_eurospeech, author={Olivier Boeffard}, title={{Variable-length acoustic units inference for text-to-speech synthesis}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={983--986}, doi={10.21437/Eurospeech.2001-261} }