12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Unsupervised Clustering of Utterances Using Non-Parametric Bayesian Methods

Ryuichiro Higashinaka (1), Noriaki Kawamae (2), Kugatsu Sadamitsu (1), Yasuhiro Minami (1), Toyomi Meguro (1), Kohji Dohsaka (1), Hirohito Inagaki (1)

(1) NTT Corporation, Japan
(2) NTT Comware Corporation, Japan

Unsupervised clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper newly introduces the infinite HMM, which is also a nonparametric Bayesian method, and verifies its effectiveness. Experimental results in two dialogue domains show that the infinite HMM, which takes into account the sequence of utterances in its clustering process, significantly outperforms the CRP. Although the infinite HMM outperformed other methods, we also found that clustering complex dialogue data, such as human-human conversations, is still hard when compared to human-machine dialogues.

Full Paper

Bibliographic reference.  Higashinaka, Ryuichiro / Kawamae, Noriaki / Sadamitsu, Kugatsu / Minami, Yasuhiro / Meguro, Toyomi / Dohsaka, Kohji / Inagaki, Hirohito (2011): "Unsupervised clustering of utterances using non-parametric Bayesian methods", In INTERSPEECH-2011, 2081-2084.