ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Duration-embedded bi-HMM for expressive voice conversion

Chi-Chun Hsia, Chung-Hsien Wu, Te-Hsien Liu

This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward's minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The durationembedded Bi-HMM trained with the EM algorithm is built for each sub-syllable class to convert the neutral speech into emotional speech considering the duration information. Finally, the prosodic cues are included in the modification of the spectrum-converted speech. The STRAIGHT algorithm is adopted for high-quality speech analysis and synthesis. Target emotions including happiness, sadness and anger are used. Objective and perceptual evaluations were conducted to compare the performance of the proposed approach with previous methods. The results show that the proposed method exhibits encouraging potential in expressive voice conversion.


doi: 10.21437/Interspeech.2005-602

Cite as: Hsia, C.-C., Wu, C.-H., Liu, T.-H. (2005) Duration-embedded bi-HMM for expressive voice conversion. Proc. Interspeech 2005, 1921-1924, doi: 10.21437/Interspeech.2005-602

@inproceedings{hsia05_interspeech,
  author={Chi-Chun Hsia and Chung-Hsien Wu and Te-Hsien Liu},
  title={{Duration-embedded bi-HMM for expressive voice conversion}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={1921--1924},
  doi={10.21437/Interspeech.2005-602}
}