11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Simplification and Extension of Non-Periodic Excitation Source Representations for High-Quality Speech Manipulation Systems

Hideki Kawahara (1), Masanori Morise (2), Toru Takahashi (3), Hideki Banno (4), Ryuichi Nisimura (1), Toshio Irino (1)

(1) Wakayama University, Japan
(2) Ritsumeikan University, Japan
(3) Kyoto University, Japan
(4) Meijo University, Japan

A systematic framework for non-periodic excitation source representation is proposed for high-quality speech manipulation systems such as TANDEM-STRAIGHT, which is basically a channel VOCODER. The proposed method consists of two subsystems for non-periodic components; a colored noise source and an event analyzer/generator. The colored noise source is represented by using a sigmoid model with non-linear level conversion. Two model parameters, boundary frequency and slope parameters, are estimated based on pitch range linear prediction combined with F0 adaptive temporal axis warping and those on the original temporal axis. The event subsystem detects events based on kurtosis of filtered speech signals. The proposed framework provides significant quality improvement for high-quality recorded speech materials.

Bibliographic reference.  Kawahara, Hideki / Morise, Masanori / Takahashi, Toru / Banno, Hideki / Nisimura, Ryuichi / Irino, Toshio (2010): "Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems", In INTERSPEECH-2010, 38-41.