8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Compression of Speech Database by Feature Separation and Pattern Clustering Using STRAIGHT

Zhenhua Ling, Yu Hu, Zhiwei Shuang, Renhua Wang

University of Science and Technology of China, China

This paper presents an alternative solution for speech database compression aiming at the embedded application of concatenative synthesis systems. The waveform of a speech segment is firstly decomposed into a prosodic pattern and a spectral pattern by STRAIGHT - a powerful speech analysis-synthesis algorithm. Then all the prosodic and spectral patterns are clustered respectively to remove the redundant acoustic information within database. The clustering process is controllable and can export flexible compression ratio to meet the actual footprint requirement of various embedded devices. Besides, some labeling and contextual information are utilized to improve the performance of pattern clustering. Subjective listening test shows that our Mandarin synthesis system with corpus compressed by proposed method at about 2.7kbps perform corresponding to the same system compressed by G.723.1 at 5.3kps and the quality degradation is not serious as the compression ratio increases.

Full Paper

Bibliographic reference.  Ling, Zhenhua / Hu, Yu / Shuang, Zhiwei / Wang, Renhua (2004): "Compression of speech database by feature separation and pattern clustering using STRAIGHT", In INTERSPEECH-2004, 1201-1204.