Application domains for speech-to-speech translation and dialog systems often contain sub-domains and/or task-types for which different outputs are appropriate for a given input. It would be useful to be able to automatically find such sub-domain structure in training corpora, and to classify new interactions with the system into one of these sub-domains. To this end, We present a document-clustering approach to such sub-domain classification, which uses a recently-developed algorithm based on von Mises Fisher distributions. We give preliminary perplexity reduction and MT performance results for a speech-to-speech translation system using this model.
Cite as: Stallard, D., Tsakalidis, S., Saleem, S. (2009) Incremental dialog clustering for speech-to-speech translation. Proc. Interspeech 2009, 428-431, doi: 10.21437/Interspeech.2009-155
@inproceedings{stallard09_interspeech, author={David Stallard and Stavros Tsakalidis and Shirin Saleem}, title={{Incremental dialog clustering for speech-to-speech translation}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={428--431}, doi={10.21437/Interspeech.2009-155} }