Phone Synchronous Decoding with CTC Lattice

Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu


Connectionist Temporal Classification (CTC) has recently shown improved efficiency in LVCSR decoding. One popular implementation is to use a CTC model to predict the phone posteriors at each frame which are then used for Viterbi beam search on a modified WFST network. This is still within the traditional frame synchronous decoding framework. In this paper, the peaky posterior property of a CTC model is carefully investigated and it is found that ignoring blank frames will not introduce additional search errors. Based on this phenomenon, a novel phone synchronous decoding framework is proposed. Here, a phone-level CTC lattice is constructed purely using the CTC acoustic model. The resultant CTC lattice is highly compact and removes tremendous search redundancy due to blank frames. Then, the CTC lattice can be composed with the standard WFST to yield the final decoding result. The proposed approach effectively separates the acoustic evidence calculation and the search operation. This not only significantly improves online search efficiency, but also allows flexible acoustic/linguistic resources to be used. Experiments on LVCSR tasks show that phone synchronous decoding can yield an extra 2–3 times speed up compared to the traditional frame synchronous CTC decoding implementation.


DOI: 10.21437/Interspeech.2016-831

Cite as

Chen, Z., Deng, W., Xu, T., Yu, K. (2016) Phone Synchronous Decoding with CTC Lattice. Proc. Interspeech 2016, 1923-1927.

Bibtex
@inproceedings{Chen+2016,
author={Zhehuai Chen and Wei Deng and Tao Xu and Kai Yu},
title={Phone Synchronous Decoding with CTC Lattice},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-831},
url={http://dx.doi.org/10.21437/Interspeech.2016-831},
pages={1923--1927}
}