14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Speaker-Specific Retraining for Enhanced Compression of Unit Selection Text-to-Speech Databases

Jani Nurminen, Hanna Silén, Moncef Gabbouj

Tampere University of Technology, Finland

Unit selection based text-to-speech systems can generally obtain high speech quality provided that the database is large enough. In embedded applications, the related memory requirements may be excessive and often the database needs to be both pruned and compressed to fit it into the available memory space. In this paper, we study the topic of database compression. In particular, the focus is on speaker-specific optimization of the quantizers used in the database compression. First, we introduce the simple concept of dynamic quantizer structures, facilitating the use of speaker-specific optimizations by enabling convenient run-time updates. Second, we show that significant memory savings can be obtained through speaker-specific retraining while perfectly maintaining the quantization accuracy, even when the memory required for the additional codebook data is taken into account. Thus, the proposed approach can be considered effective in reducing the conventionally large footprint of unit selection based text-to-speech systems.

Full Paper

Bibliographic reference.  Nurminen, Jani / Silén, Hanna / Gabbouj, Moncef (2013): "Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases", In INTERSPEECH-2013, 388-391.