Second International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2001)

Florence, Italy
September 13-15, 2001

Voice Quality as an Aspect of Prosodic Control in Speech Utterance: The Base Function Representation of the C/D Model

Osamu Fujimura

Department of Speech and Hearing Science, The Ohio State University Columbus, OH, USA

Traditionally, based on the phoneme concatenation and coarticulation concept, speech signals were interpreted as a linear string of consonants and vowels to represent the sound aspect of a linguistic form. The so-called suprasegmental characteristics of speech signals were typically represented by the voice fundamental frequency contour as a time function, associated with the linear string of phonemic segments. A new syllable-based phonetic theory of speech signal organization, the Converter-Distributor model [Fujimura, Phonetica 2000] describes speech signals as a base function with superimposed local articulatory movement patterns for consonantal gestures. The base function comprises, as its aspects, vowels representing the time series of syllable nuclei as well as jaw opening control and the voice function. Voice quality control is part of the voice function. The voice pitch (F0) change, from this point of view, is one (important and robustly observable) of the physical variables of voice quality control. Such base function aspects constitute the phonetic melody. In parallel, a syllable-boundary pulse train representing the metrical organization of the speech signal constitutes the phonetic skeleton (in an extended sense of the terminology used in nonlinear phonology). The phonetic melody is linked to the skeleton syllable by syllable.
   Voice quality control, in this sense, includes voice pitch change, control of the speech spectral envelope, and the temporal span and variability of voicing conditions. It is to be noted, however, that the same physical variables, such as the voice fundamental frequency, are affected by metrical conditions also, in particular the syllable pulse magnitude, even without independent control based on phonological tonal features, such as lexical or phrasal tone/accent control.
   In singing, depending on the musical genre, the same singer can choose consciously a particular voice quality, by setting system parameters of the phonetic implementation process, from the C/D model point of view. Estill [1997] proposed six basic qualities. Some temporal perturbation characteristics in the six different voice qualities are discussed by Kawahara, Estill, and Fujimura in this meeting, using Kawahara's new signal processing method STRAIGHT.

Full Paper

Bibliographic reference.  Fujimura, Osamu (2001): "Voice quality as an aspect of prosodic control in speech utterance: the base function representation of the c/d model", In MAVEBA-2001, 123-124.