Tone Recognition Using Lifters and CTC

Loren Lugosch, Vikrant Singh Tomar


In this paper, we present a new method for recognizing tones in continuous speech for tonal languages. The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network and predicting the underlying sequence of tones using a connectionist temporal classification (CTC) network. The performance of the proposed method is evaluated on a freely available Mandarin Chinese speech corpus, AISHELL-1 and is shown to outperform the existing techniques in the literature in terms of tone error rate (TER).


 DOI: 10.21437/Interspeech.2018-2293

Cite as: Lugosch, L., Tomar, V.S. (2018) Tone Recognition Using Lifters and CTC. Proc. Interspeech 2018, 2305-2309, DOI: 10.21437/Interspeech.2018-2293.


@inproceedings{Lugosch2018,
  author={Loren Lugosch and Vikrant Singh Tomar},
  title={Tone Recognition Using Lifters and CTC},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2305--2309},
  doi={10.21437/Interspeech.2018-2293},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2293}
}