Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

Acoustic and Linguistic Characterization of Spontaneous Speech

Masanobu Nakamura, Sadaoki Furui, Koji Iwano

Tokyo Institute of Technology, Department of Computer Science, Ookayama, Meguro-ku, Tokyo, Japan

Although speech derived from reading texts, and similar types of speech, e.g. that from reading newspapers or that from news broadcast, can be recognized with high accuracy, recognition accuracy drastically decreases for spontaneous speech. This is due to the fact that spontaneous speech and read speech are significantly different acoustically as well as linguistically. This paper reports analysis and recognition of spontaneous speech using a large-scale spontaneous speech database "Corpus of Spontaneous Japanese (CSJ)". Spectral analysis using various styles of utterances in the CSJ shows that the spectral distribution/difference of phonemes is significantly reduced in spontaneous speech compared to read speech. Experimental results also show that there is a strong correlation between mean spectral distance between phonemes and phoneme recognition accuracy. This indicates that spectral reduction is one major reason for the decrease of recognition accuracy of spontaneous speech. Comparative analysis of statistical language models for written language, including newspaper articles, and spontaneous speech shows that there is a significant difference between written language and spontaneous speech in terms of observation frequency of each part-of-speech and perplexity.

Full Paper
Presentation (.ppt)

Bibliographic reference.  Nakamura, Masanobu / Furui, Sadaoki / Iwano, Koji (2006): "Acoustic and linguistic characterization of spontaneous speech", In SRIV-2006, 3-8.