15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

A Semi-Markov Model for Speech Segmentation with an Utterance-Break Prior

Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes

University of Edinburgh, UK

Speech segmentation is the problem of finding the end points of a speech utterance for passing to an automatic speech recognition (ASR) system. The quality of this segmentation can have a large impact on the accuracy of the ASR system; in this paper we demonstrate that it can have an even larger impact on downstream natural language processing tasks — in this case, machine translation. We develop a novel semi-Markov model which allows the segmentation of audio streams into speech utterances which are optimised for the desired distribution of sentence lengths for the target domain. We compare this with existing state-of-the-art methods and show that it is able to achieve not only improved ASR performance, but also to yield significant benefits to a speech translation task.

Full Paper

Bibliographic reference.  Sinclair, Mark / Bell, Peter / Birch, Alexandra / McInnes, Fergus (2014): "A semi-Markov model for speech segmentation with an utterance-break prior", In INTERSPEECH-2014, 2351-2355.