ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

Unit selection speech synthesis using phonetic-prosodic description of speech databases

Tetyana Lyudovyk, Mykola Sazhok

This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training of the linguistic processor. The automatic transcriptor, duration and intonation modules are trained to model the speech characteristics of different persons and thus to generate different target specifications of one and the same input text during the synthesis stage. A target specification is a detailed phonetic-prosodic transcription used by the unit selection module. The unit selection algorithm is based on criteria derived from categories of phonetic-prosodic annotations of speech databases and works without spectral matching. The output of the unit selection module is an acoustic phonetic-prosodic transcription which is used by the acoustic processor to generate a speech wave. Two non-professional speaker databases with different speaking styles have been created and tested.


Cite as: Lyudovyk, T., Sazhok, M. (2004) Unit selection speech synthesis using phonetic-prosodic description of speech databases. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 594-599

@inproceedings{lyudovyk04_specom,
  author={Tetyana Lyudovyk and Mykola Sazhok},
  title={{Unit selection speech synthesis using phonetic-prosodic description of speech databases}},
  year=2004,
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},
  pages={594--599}
}