We are interested in the problem of extracting meaning structures from spoken utterances in human communication. In SLU systems, parsing of meaning structures is carried over the word hypotheses generated by the ASR. This approach suffers from high word error rates and ad-hoc conceptual representations. In contrast, in this paper we aim at discovering meaning components from direct measurements of acoustic and non-verbal linguistic features. The meaning structures are taken from the frame semantics model proposed in FrameNet. We give a quantitative analysis of meaning structures in terms of speech features across human--human dialogs from the manually annotated LUNA corpus. We show that the acoustic correlations between pitch, formant trajectories, intensity and harmonicity and meaning features are statistically significant over the whole corpus as well as relevant in classifying the target words evoked by a semantic frame.
Bibliographic reference. Ivanov, Alexei V. / Riccardi, Giuseppe / Ghosh, S. / Tonelli, S. / Stepanov, E. A. (2010): "Acoustic correlates of meaning structure in conversational speech", In INTERSPEECH-2010, 1129-1132.