Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
We examined the ability of human listeners to classify vowel sounds into sixteen different categories when the sounds are excised from fluent speech and presented in isolation. The duration of each sound and its phonetic category were provided as part of the TIMIT corpus of phonetically labeled utterances. The specific objectives of the research were: (a) to compare the classification responses of listeners with the phonetic labels provided by experts, (b) to understand the influence of phonetic context on classification performance, (c) to determine the influence of prior exposure to the speaker's voice on vowel classification, and (d) to establish more effective performance benchmarks for the evaluation of phonetic classification algorithms. Vowels presented in isolation were identified with about 54.8% accuracy, compared to the phonetic labels provided by experts. Providing listeners with additional context, by extending the speech excerpt to include the segments preceding and following the vowel, improved classification performance to about 65.9%. In a second experiment, information about the talker who produced the vowel excerpt was presented, in the form of a short phrase, just before the vowel excerpt was presented for classification. There was a small but significant increase in listener-labeler agreement with prior exposure to the speaker's voice.
Bibliographic reference. Cole, Ronald A. / Muthusamy, Yeshwant K. (1992): "Perceptual studies on vowels excised from continuous speech", In ICSLP-1992, 1087-1090.