ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training

William Chan, Ian Lane

In this paper, we explore the use of attention-based models for online speech recognition without the usage of language models or searching. Our model is based on an attention-based neural network which directly emits English/Mandarin characters as outputs. The model jointly learns the pronunciation, acoustic and language model. We evaluate the model for online speech recognition on English and Mandarin. On English, we achieve a 33.0% WER on the WSJ task, or a 5.4% absolute reduction in WER compared to an online CTC based system. We also introduce a new training method and show how we can learn joint Mandarin Character-Pinyin models. Our Mandarin character only model achieves a 72% CER on the GALE Phase 2 evaluation, and with our joint Mandarin Character-Pinyin model, we achieve 59.3% CER or 12.7% absolute improvement over the character only model.


doi: 10.21437/Interspeech.2016-334

Cite as: Chan, W., Lane, I. (2016) On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training. Proc. Interspeech 2016, 3404-3408, doi: 10.21437/Interspeech.2016-334

@inproceedings{chan16c_interspeech,
  author={William Chan and Ian Lane},
  title={{On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={3404--3408},
  doi={10.21437/Interspeech.2016-334}
}