Sixth European Conference on Speech Communication and Technology
The usage of multiple Hidden Markov Models (HMMs) to prepare a Czech acoustic unit inventory and speech synthesis based on this inventory are presented in this paper. Triphone HMMs are trained on the basis of the speech corpus spoken by a single speaker. The states of triphone HMMs are automatically clustered down using binary decision trees. The clustered states are then used to automatically segment the speech corpus and to create a speech segment database. The acoustic unit inventory constructed in this way is assumed to enable more precise context modeling than was previously possible. Concatenation-based speech synthesizer can be designed on the basis of the speech segment database. Several speech synthesis techniques are discussed for this purpose. In the end, a Czech text-to-speech (TTS) system is presented.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Matousek, Jindrich (1999): "Speech synthesis using HMM-based acoustic unit inventory", In EUROSPEECH'99, 2323-2326.