Unit selection and hidden Markov model (HMM) based synthesis have become the dominant techniques in text-to-speech (TTS) research. In this work, we combine HMM-based signal generation with the front end originally designed for unit selection based Finnish TTS and we evaluate the prosody of the output generated by the two synthesis techniques using the same speech database. Furthermore, we study the effect that the training set size has for the prosody and intelligibility in HMM-based synthesis. The results indicate that the HMM-based approach is capable of providing better prosody than unit selection even if the training set size is severely limited. The size of the training set, however, affects the prosodic quality and intelligibility of the HMM-based synthesizer.
Bibliographic reference. Silen, Hanna / Helander, Elina / Nurminen, Jani / Gabbouj, Moncef (2008): "Evaluation of Finnish unit selection and HMM-based speech synthesis", In INTERSPEECH-2008, 1853-1856.