ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Fast speech decoding through phone confusion networks

Nicola Bertoldi, Marcello Federico, Daniele Falavigna, Matteo Gerosa

We present a two stage automatic speech recognition architecture suited for applications, such as spoken document retrieval, where large scale language models can be used and very low out-ofvocabulary rates need to be reached. The proposed system couples a weakly constrained phone-recognizer with a phone-to-word decoder that was originally developed for phrase-based statistical machine translation. The decoder permits to efficiently decode confusion networks in input, and to exploit large scale unpruned language models. Preliminary experiments are reported on the transcription of speeches of the Italian parliament. The use of phone confusion networks as interface between the two decoding steps permits to reduce the WER by 28%, thus making the system perform relatively close to a state-of-the-art baseline using a comparable language model.

doi: 10.21437/Interspeech.2008-543

Cite as: Bertoldi, N., Federico, M., Falavigna, D., Gerosa, M. (2008) Fast speech decoding through phone confusion networks. Proc. Interspeech 2008, 2094-2097, doi: 10.21437/Interspeech.2008-543

  author={Nicola Bertoldi and Marcello Federico and Daniele Falavigna and Matteo Gerosa},
  title={{Fast speech decoding through phone confusion networks}},
  booktitle={Proc. Interspeech 2008},