Second International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2001)

Florence, Italy
September 13-15, 2001

Aperiodicity Extraction and Control using Mixed Mode Excitation and Group Delay Manipulation for a High Quality Speech Analysis, Modification and Synthesis System STRAIGHT

Hideki Kawahara (1,2), Jo Estill (3), Osamu Fujimura (4)

aFaculty of Systems Engineering, Wakayama University, Wakayama, Japan bInformation Sciences Division, ATR, Kyoto, Japan cEstill Voice Training Systems, Santa Rosa, CA, USA dDepartment of Speech & Hearing Science, The Ohio State University, Columbus, OH, USA

A new control paradigm of source signals for high quality speech synthesis is introduced to handle a variety of speech quality, based on timefrequency analyses by the use of an instantaneous frequency and group delay. The proposed signal representation consists of a frequency domain aperiodicity measure and a time domain energy concentration measure to represent source attributes, which supplement the conventional source information, such as F0 and power. The frequency domain aperiodicity measure is defined as a ratio between the lower and upper smoothed spectral envelopes to represent the relative energy distribution of aperiodic components. The time domain measure is defined as an effective duration of the aperiodic component. These aperiodicity parameters and F0 as time functions are used to generate the source signal for synthetic speech by controlling relative noise levels and the temporal envelope of the noise component of the mixed mode excitation signal, including fine timing and amplitude fluctuations. A series of preliminary simulation experiments was conducted to test and to demonstrate consistency of the proposed method. Examples sung in different voice qualities were also analyzed and resynthesized using the proposed method.

Index Terms. Fundamental frequency; Voice perturbation; Instantaneous frequency; Group delay; Aperiodicity; Fluctuation

Full Paper

Bibliographic reference.  Kawahara, Hideki / Estill, Jo / Fujimura, Osamu (2001): "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT", In MAVEBA-2001, 59-64.