Second International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2001)
A new control paradigm of source signals for high quality speech synthesis is introduced to handle a variety of speech quality, based on timefrequency analyses by the use of an instantaneous frequency and group delay. The proposed signal representation consists of a frequency domain aperiodicity measure and a time domain energy concentration measure to represent source attributes, which supplement the conventional source information, such as F0 and power. The frequency domain aperiodicity measure is defined as a ratio between the lower and upper smoothed spectral envelopes to represent the relative energy distribution of aperiodic components. The time domain measure is defined as an effective duration of the aperiodic component. These aperiodicity parameters and F0 as time functions are used to generate the source signal for synthetic speech by controlling relative noise levels and the temporal envelope of the noise component of the mixed mode excitation signal, including fine timing and amplitude fluctuations. A series of preliminary simulation experiments was conducted to test and to demonstrate consistency of the proposed method. Examples sung in different voice qualities were also analyzed and resynthesized using the proposed method.
Index Terms. Fundamental frequency; Voice perturbation; Instantaneous frequency; Group delay; Aperiodicity; Fluctuation
Bibliographic reference. Kawahara, Hideki / Estill, Jo / Fujimura, Osamu (2001): "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT", In MAVEBA-2001, 59-64.