5th International Conference on Spoken Language Processing
This paper presents Mimic : a decision-tree based concatenative voice adaptive text to speech synthesiser. Mimic integrates text to speech synthesis (TTS) with speech recognition and speaker adaptation. Speech is synthesised from concatenation of triphone synthesis units. The triphone units are obtained from clusters of training examples modelled, labelled and segmented using clustered HMMs and Viterbi segmentation. The prosodic structure of pitch, duration and energy contours are captured using statistical training methods. The concept of a decision-tree based statistical micro-prosody model is introduced as a hierarchical method of modelling prosodic parameters. The voice adaptation component involves the adaptation of the spectral parameters as well as pitch, duration, and energy.
Bibliographic reference. Chen, Aimin / Vaseghi, Saeed / Ho, Charles (1998): "MIMIC : a voice-adaptive phonetic-tree speech synthesiser", In ICSLP-1998, paper 0204.