We present a technique for two-stream processing of speech signals for emotion detection. The first stream recognises emotion from acoustic features while the second stream recognises emotion from the semantics of the conversation. A probabilistic measure is derived for each of the individual streams and the emotion category from the two streams is recognised. The output of the two streams is combined to generate a score for a particular emotion category. The confidence level of each stream is used to weigh the scores from the two streams while generating the final score. This technique is extremely significant for call-center data that have some semantics associated with the speech.
The proposed technique is evaluated on the LDC corpus and on the real-word call-center data. Experiments suggest that use of a two-stream process provides better results than the existing techniques of extracting emotion only from acoustic features.
Bibliographic reference. Gupta, Purnima / Rajput, Nitendra (2007): "Two-stream emotion recognition for call center monitoring", In INTERSPEECH-2007, 2241-2244.