Sixth European Conference on Speech Communication and Technology
For the eficient translation of speech by machine, the word sequence alone is not always sufficient to convey the intended meaning. Prosodic information can be lost in the speech recognition process. This paper presents methods by which focus can be detected in the input speech using timing and pitch information. By comparing the prosodic characteristics of an input utterance against profiles generated by components of a speech synthesiser for a default rendition of the same sequence of words, we are able to detect areas in the signal where prominence has been added.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Kitagawa, Satoshi / Campbell, Nick (1999): "Focus detection by comparison of speech waveforms", In EUROSPEECH'99, 1867-1870.