INTERSPEECH 2004 - ICSLP
Contrast is a very popular phenomenon in spoken language, and carries very important information to help understanding contents and structures of spoken language. In this paper, we propose an idea of automatic contrast detection as an effort for better speech understanding. We study the automatic tagging of three specific types of contrast: symmetric contrast, contrastive focus, and contrastive topic. We label the three types of contrasted words as contrast (C), and other words as noncontrast (C). The classification of contrast events is based on prosodic, spectral, and part-of-speech (POS) information sources. The integration of different knowledge sources is realized by a time-delay recursive neural network (TDRNN). The approach we proposed was testified on 235 spontaneous utterances consisting of 3500 words (samples). The contrast detection was speaker independent. The tests yielded an average of 87.9% classification rate.
Bibliographic reference. Hasegawa-Johnson, Mark / Levinson, Stephen / Zhang, Tong (2004): "Automatic detection of contrast for speech understanding", In INTERSPEECH-2004, 581-584.