A new efficient join cost calculation technique for unit selection based synthesis is proposed. The acoustic features representing the spectral content at the unit boundaries are encoded using multi-stage vector quantization. After applying pseudo-gray coding, the join costs are directly approximated based on the stage-wise codebook indices. As a result, both the memory requirement and the computation complexity are effectively reduced at the same time, making the technique especially suitable for embedded text-to-speech systems. Experiments are carried out comparing the proposed scheme with the original baseline technique that operates in a lossless manner using the uncompressed acoustic data and similarity measurement. Based on the experimental findings, the use of the proposed technique seems to perfectly maintain the speech quality despite the considerable reduction in complexity and memory usage.
Bibliographic reference. Ding, Feng / Nurminen, Jani / Tian, Jilei (2008): "Efficient join cost computation for unit selection based TTS systems", In INTERSPEECH-2008, 589-592.