Application domains for speech-to-speech translation and dialog systems often contain sub-domains and/or task-types for which different outputs are appropriate for a given input. It would be useful to be able to automatically find such sub-domain structure in training corpora, and to classify new interactions with the system into one of these sub-domains. To this end, We present a document-clustering approach to such sub-domain classification, which uses a recently-developed algorithm based on von Mises Fisher distributions. We give preliminary perplexity reduction and MT performance results for a speech-to-speech translation system using this model.
Bibliographic reference. Stallard, David / Tsakalidis, Stavros / Saleem, Shirin (2009): "Incremental dialog clustering for speech-to-speech translation", In INTERSPEECH-2009, 428-431.