12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Agglomerative Hierarchical Clustering of Emotions in Speech Based on Subjective Relative Similarity

Ryoichi Takashima (1), Tohru Nagano (2), Ryuki Tachibana (2), Masafumi Nishimura (2)

(1) Kobe University, Japan
(2) IBM Research - Tokyo, Japan

When we humans are asked whether or not the emotions in two speech samples are in the same category, the judgment depends on the size of the target category. Hierarchical clustering is a suitable technique for simulating such perceptions by humans of relative similarities of the emotions in speech. For better reflection of subjective similarities in clustering results, we have devised a method of hierarchical clustering that uses a new type of relative similarity data based on tagging the most similar pair in sets of three samples. This type of data allowed us to create a closed-loop algorithm for feature weight learning that uses the clustering performance as the objective function. When classifying the utterances of a specific sentence in Japanese recorded at a real call center, the method reduced the errors by 15.2%.

Full Paper

Bibliographic reference.  Takashima, Ryoichi / Nagano, Tohru / Tachibana, Ryuki / Nishimura, Masafumi (2011): "Agglomerative hierarchical clustering of emotions in speech based on subjective relative similarity", In INTERSPEECH-2011, 2473-2476.