ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

This paper proposes a technique of reducing footprint of HMMbased speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.


doi: 10.21437/Interspeech.2009-143

Cite as: Oura, K., Zen, H., Nankaku, Y., Lee, A., Tokuda, K. (2009) Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems. Proc. Interspeech 2009, 1759-1762, doi: 10.21437/Interspeech.2009-143

@inproceedings{oura09_interspeech,
  author={Keiichiro Oura and Heiga Zen and Yoshihiko Nankaku and Akinobu Lee and Keiichi Tokuda},
  title={{Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1759--1762},
  doi={10.21437/Interspeech.2009-143}
}