Second International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2001)
Traditionally, based on the phoneme concatenation and coarticulation concept, speech
signals were interpreted as a
linear string of consonants and vowels to represent the sound aspect of a linguistic form.
The so-called suprasegmental
characteristics of speech signals were typically represented by the voice fundamental
frequency contour as a time
function, associated with the linear string of phonemic segments. A new syllable-based
phonetic theory of speech signal
organization, the Converter-Distributor model [Fujimura, Phonetica 2000] describes speech
signals as a base function
with superimposed local articulatory movement patterns for consonantal gestures. The base
function comprises, as its
aspects, vowels representing the time series of syllable nuclei as well as jaw opening
control and the voice function. Voice
quality control is part of the voice function. The voice pitch (F0) change, from this
point of view, is one (important and
robustly observable) of the physical variables of voice quality control. Such base
function aspects constitute the phonetic
melody. In parallel, a syllable-boundary pulse train representing the metrical
organization of the speech signal constitutes
the phonetic skeleton (in an extended sense of the terminology used in nonlinear
phonology). The phonetic melody is
linked to the skeleton syllable by syllable.
Voice quality control, in this sense, includes voice pitch change, control of the speech spectral envelope, and the temporal span and variability of voicing conditions. It is to be noted, however, that the same physical variables, such as the voice fundamental frequency, are affected by metrical conditions also, in particular the syllable pulse magnitude, even without independent control based on phonological tonal features, such as lexical or phrasal tone/accent control.
In singing, depending on the musical genre, the same singer can choose consciously a particular voice quality, by setting system parameters of the phonetic implementation process, from the C/D model point of view. Estill  proposed six basic qualities. Some temporal perturbation characteristics in the six different voice qualities are discussed by Kawahara, Estill, and Fujimura in this meeting, using Kawahara's new signal processing method STRAIGHT.
Bibliographic reference. Fujimura, Osamu (2001): "Voice quality as an aspect of prosodic control in speech utterance: the base function representation of the c/d model", In MAVEBA-2001, 123-124.