Speech Prosody 2004
An investigation has been made on the perceptual nature of CV-syllables taken out from a running speech and their acoustic characteristics. Fifteen short Japanese sentences uttered by four male speakers with three different speaking rates, fast, normal, and slow, have been used. Syllable identification for speech segments taken out from a running speech has been made in three different ways: 1) one-syllable segmentation, 2) two-syllable segmentation, and 3) three-syllable segmentation. In the one-syllable segmentation, individual syllables have been taken out from the running speech and presented to listeners for identification. In the two-and three-syllable segmentations, every two and three successive syllables have been taken out, respectively. In the one-syllable segmentation experiments, the average syllable identifications for the fast, normal, and slow speech are 35%, 59%, and 86%, respectively. The result reveals that individual syllables for the fast and normal speech do not have enough phonetic information to be correctly identified, but for the slow speech it retains fairly well. Phonetic information for a syllable is not sufficiently preserved in a consecutive two-syllable segment (two-syllable segmentation experiment) especially for the fast speech. However, the middle syllable in the three-syllable segmentation has been found to carry enough phonetic information to be correctly identified even for the fast speech. A relation between the perceptual results and the acoustic properties has been discussed.
Bibliographic reference. Kuwabara, Hisao (2004): "Perceptual properties of syllables isolated from continuous speech for different speaking rates", In SP-2004, 729-732.