10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Tying Covariance Matrices to Reduce the Footprint of HMM-Based Speech Synthesis Systems

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Nagoya Institute of Technology, Japan

This paper proposes a technique of reducing footprint of HMMbased speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.

Full Paper

Bibliographic reference.  Oura, Keiichiro / Zen, Heiga / Nankaku, Yoshihiko / Lee, Akinobu / Tokuda, Keiichi (2009): "Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems", In INTERSPEECH-2009, 1759-1762.