11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Quantized HMMs for Low Footprint Text-to-Speech Synthesis

Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor

Phonetic Arts, UK

This paper proposes the use of Quantized HiddenMarkovModels (QHMMs) for reducing the footprint of conventional parametric HMM-based TTS system. Previously, this technique was successfully applied to automatic speech recognition in embedded devices without loss of recognition performance. In this paper we investigate the construction of different quantized HMM configurations that serve as input to the standard ML-based parameter generation algorithm. We use both subjective and objective tests to compare the resulting systems. Subjective results for specific compression configurations show no significant preference although some spectral distortion is reported. We conclude that a trade-off is necessary in order to satisfy both speech quality and low-footprint memory requirements.

Full Paper

Bibliographic reference.  Gutkin, Alexander / Gonzalvo, Xavi / Breuer, Stefan / Taylor, Paul (2010): "Quantized HMMs for low footprint text-to-speech synthesis", In INTERSPEECH-2010, 837-840.