ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A WFST-based log-linear framework for speaking-style transformation

Graham Neubig, Shinsuke Mori, Tatsuya Kawahara

When attempting to make transcripts from automatic speech recognition results, disfluency deletion, transformation of colloquial expressions, and insertion of dropped words must be performed to ensure that the final product is clean transcript-style text. This paper introduces a system for the automatic transformation of the spoken word to transcript-style language that enables not only deletion of disfluencies, but also substitutions of colloquial expressions and insertion of dropped words. A number of potentially useful features are combined in a log-linear probabilistic framework, and the utility of each is examined. The system is implemented using weighted finite state transducers (WFSTs) to allow for easy combination of features and integration with other WFST-based systems. On evaluation, the best system achieved a 5.37% word error rate, a 5.49% absolute gain over a rule-based baseline and a 1.54% absolute gain over a simple noisy-channel model.

doi: 10.21437/Interspeech.2009-455

Cite as: Neubig, G., Mori, S., Kawahara, T. (2009) A WFST-based log-linear framework for speaking-style transformation. Proc. Interspeech 2009, 1495-1498, doi: 10.21437/Interspeech.2009-455

  author={Graham Neubig and Shinsuke Mori and Tatsuya Kawahara},
  title={{A WFST-based log-linear framework for speaking-style transformation}},
  booktitle={Proc. Interspeech 2009},