Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Rhythmic Organization and Signal Characteristics of Speech

Osamu Fujimura

Department of Speech and Hearing Science, The Ohio State University, USA

The Converter-Distributor (C/D) Model, a generative theory of phonetic implementation, describes an utterance as a linear string of syllables with intervening boundaries. Its base component includes phonetic status contours for voicing, tonal, and vocalic gestures. Consonantal elemental gestures, as stored impulse responses, are excited by the syllable pulse and superimposed onto the base function. A magnitude-modulated syllable-boundary pulse train constitutes a skeletal representation of the rhythmic organization of the utterance. All the temporal characteristics of the speech signal are computed based on the input specifications for each syllable by phonological features and the metrical structure, numerically augmented by prominence enhancement specified for the discourse situation, along with system parameter settings for the particular speaker in each discourse. Segmental durations in the acoustic signal vary according to syllable magnitude, not uniformly among consonants and vowels. The C/D model predicts complex patterns of such prosodic effects on segmental duration as a function of fixed threshold values for relating abstract gestures to observable durations of acoustic signals. (Supported in part by NSF and ATR)

