Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu


Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attentionbased model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the- art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of 28.77%, which is competitive to the state-of-the-art CER of 28.0% by the joint CTC-attention based encoder-decoder network.


 DOI: 10.21437/Interspeech.2018-1107

Cite as: Zhou, S., Dong, L., Xu, S., Xu, B. (2018) Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese. Proc. Interspeech 2018, 791-795, DOI: 10.21437/Interspeech.2018-1107.


@inproceedings{Zhou2018,
  author={Shiyu Zhou and Linhao Dong and Shuang Xu and Bo Xu},
  title={Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={791--795},
  doi={10.21437/Interspeech.2018-1107},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1107}
}