Speech segmentation is the problem of finding the end points of a speech utterance for passing to an automatic speech recognition (ASR) system. The quality of this segmentation can have a large impact on the accuracy of the ASR system; in this paper we demonstrate that it can have an even larger impact on downstream natural language processing tasks in this case, machine translation. We develop a novel semi-Markov model which allows the segmentation of audio streams into speech utterances which are optimised for the desired distribution of sentence lengths for the target domain. We compare this with existing state-of-the-art methods and show that it is able to achieve not only improved ASR performance, but also to yield significant benefits to a speech translation task.
Bibliographic reference. Sinclair, Mark / Bell, Peter / Birch, Alexandra / McInnes, Fergus (2014): "A semi-Markov model for speech segmentation with an utterance-break prior", In INTERSPEECH-2014, 2351-2355.