Unsupervised clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper newly introduces the infinite HMM, which is also a nonparametric Bayesian method, and verifies its effectiveness. Experimental results in two dialogue domains show that the infinite HMM, which takes into account the sequence of utterances in its clustering process, significantly outperforms the CRP. Although the infinite HMM outperformed other methods, we also found that clustering complex dialogue data, such as human-human conversations, is still hard when compared to human-machine dialogues.
Bibliographic reference. Higashinaka, Ryuichiro / Kawamae, Noriaki / Sadamitsu, Kugatsu / Minami, Yasuhiro / Meguro, Toyomi / Dohsaka, Kohji / Inagaki, Hirohito (2011): "Unsupervised clustering of utterances using non-parametric Bayesian methods", In INTERSPEECH-2011, 2081-2084.