INTERSPEECH 2004 - ICSLP
This paper presents an alternative solution for speech database compression aiming at the embedded application of concatenative synthesis systems. The waveform of a speech segment is firstly decomposed into a prosodic pattern and a spectral pattern by STRAIGHT - a powerful speech analysis-synthesis algorithm. Then all the prosodic and spectral patterns are clustered respectively to remove the redundant acoustic information within database. The clustering process is controllable and can export flexible compression ratio to meet the actual footprint requirement of various embedded devices. Besides, some labeling and contextual information are utilized to improve the performance of pattern clustering. Subjective listening test shows that our Mandarin synthesis system with corpus compressed by proposed method at about 2.7kbps perform corresponding to the same system compressed by G.723.1 at 5.3kps and the quality degradation is not serious as the compression ratio increases.
Bibliographic reference. Ling, Zhenhua / Hu, Yu / Shuang, Zhiwei / Wang, Renhua (2004): "Compression of speech database by feature separation and pattern clustering using STRAIGHT", In INTERSPEECH-2004, 1201-1204.