This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small-sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is employed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto-segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level parameters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adaptation method.
Bibliographic reference. Huang, Yi-Chin / Wu, Chung-Hsien / Shie, Ming-Ge (2015): "Fluent personalized speech synthesis with prosodic word-level spontaneous speech generation", In INTERSPEECH-2015, 294-298.