In this paper we've studied the problem of finding the spoken turn boundaries in human-to-human telephone conversations. This task is essential to enable the optimal operational conditions for automated speech recognition of dialogs. The problem formulation is different from the conventional voice activity detection and dialog diarization. We have explored applicability of various algorithms for this task and have found that a hidden Markov model combining results of the modulation spectrum analysis and Kullback-Leibler divergence of adjacent signal portions produces the best predictions. The performance of the algorithms was evaluated on realistic conversational data taken from Switchboard corpus.
Bibliographic reference. Ivanov, Alexei V. / Riccardi, Giuseppe (2010): "Automatic turn segmentation in spoken conversations", In INTERSPEECH-2010, 3130-3133.