This paper proposes a technique of reducing footprint of HMMbased speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.
Bibliographic reference. Oura, Keiichiro / Zen, Heiga / Nankaku, Yoshihiko / Lee, Akinobu / Tokuda, Keiichi (2009): "Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems", In INTERSPEECH-2009, 1759-1762.