Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Acoustic and Perceptual Modelling of the Voice Quality Caused by Fundamental Frequency Perturbation

Satoshi Imaizumi (1), Jan Gauffin (2)

(1) Research Institute of Logopedics and Phoniatrics, Faculty of Medicine, University of Tokyo, Tokyo, Japan
(2) Department of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm, Sweden

This paper reports some results to clarify the acoustic arid perceptual characteristics of the voice qualities described by the scales "Rough", "Creak", "Fry" and "Diplophonia". Based on acoustic analyses of 102 voice samples, a synthesis model of pitch/amplitude perturbations was constructed. Through perceptual experiments on synthetic voice samples generated using the model, the following results were obtained. 1) The synthesis model constructed based on the acoustic analysis was found capable of generating the differences between the voice qualities denoted "Rough", "Diplophonia", "Fry" and "Creak." 2) The "Rough" quality seemed to be perceived when listeners holistically perceived the effect of perturbations as one coherent quality. 3) Other qualities seemed to be perceived when listeners analytically perceived the effect of perturbations as two or more separate sets of frequency components ("Diplophonia"), an additional sensation of repeating impulses corresponding to the perturbation frequency ("Fry"), and the special case of "Fry" observed at final parts of the voice ("Creak").

Bibliographic reference.  Imaizumi, Satoshi / Gauffin, Jan (1992): "Acoustic and perceptual modelling of the voice quality caused by fundamental frequency perturbation", In ICSLP-1992, 133-136.