This paper presents Mimic : a decision-tree based concatenative voice adaptive text to speech synthesiser. Mimic integrates text to speech synthesis (TTS) with speech recognition and speaker adaptation. Speech is synthesised from concatenation of triphone synthesis units. The triphone units are obtained from clusters of training examples modelled, labelled and segmented using clustered HMMs and Viterbi segmentation. The prosodic structure of pitch, duration and energy contours are captured using statistical training methods. The concept of a decision-tree based statistical micro-prosody model is introduced as a hierarchical method of modelling prosodic parameters. The voice adaptation component involves the adaptation of the spectral parameters as well as pitch, duration, and energy.
Cite as: Chen, A., Vaseghi, S., Ho, C. (1998) MIMIC : a voice-adaptive phonetic-tree speech synthesiser. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0204, doi: 10.21437/ICSLP.1998-20
@inproceedings{chen98_icslp, author={Aimin Chen and Saeed Vaseghi and Charles Ho}, title={{MIMIC : a voice-adaptive phonetic-tree speech synthesiser}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0204}, doi={10.21437/ICSLP.1998-20} }