11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Sub-Band Basis Spectrum Model for Pitch-Synchronous Log-Spectrum and Phase Based on Approximation of Sparse Coding

Masatsune Tamura, Takehiko Kagoshima, Masami Akamine

Toshiba Corporation, Japan

In this paper, we propose a sub-band basis spectrum model (SBM) which is a new spectrum representation model that uses a linear combination of sub-band basis. We first apply sparse coding to the pitch-synchronously analyzed log-spectra. Based on the approximation of the resulting basis, we set a sub-band basis using 1-cycle sinusoidal shapes that have mel-scale for lower frequencies and equally spaced scale for higher frequencies. Parameter of SBM of the log spectrum and the phase spectrum is calculated by fitting the basis to the spectrum. Since the parameter represents the shape of the spectrum, it can be used for frequency warping and filtering based voice adaptation for unit-fusion based TTS. Experimental results show that the analysis synthesis speech is close to original speech and that there are no significant difference between the synthetic speech using analysis-synthesis database and those using original database for unit-fusion based TTS.

Full Paper

Bibliographic reference.  Tamura, Masatsune / Kagoshima, Takehiko / Akamine, Masami (2010): "Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding", In INTERSPEECH-2010, 2406-2409.