Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns

Atreyee Saha, Chiranjeevi Yarra, Prasanta Kumar Ghosh


Second language learners of British English (BE) are typically trained to learn four intonation classes — Glide-up, Glide-down, Dive and Take-off. We predict the intonation class in a learner’s utterance by modeling the temporal dependencies in the pitch patterns with gated recurrent unit (GRU) networks. For these, we pre-train the GRU network using a set of synthesized pitch patterns representing each intonation class. For the synthesis, we propose to obtain pitch patterns from the tone sequences representing each intonation class obtained from domain knowledge. Experiments are conducted on speech data collected from experts in a spoken English training material for teaching BE intonation. The absolute improvements in the unweighted average recall (UAR) using the proposed scheme with pre-training are found to be 4.14% and 6.01% respectively over the proposed approach without pre-training and the baseline scheme that uses hidden Markov models (HMMs).


 DOI: 10.21437/Interspeech.2019-2351

Cite as: Saha, A., Yarra, C., Ghosh, P.K. (2019) Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns. Proc. Interspeech 2019, 959-963, DOI: 10.21437/Interspeech.2019-2351.


@inproceedings{Saha2019,
  author={Atreyee Saha and Chiranjeevi Yarra and Prasanta Kumar Ghosh},
  title={{Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={959--963},
  doi={10.21437/Interspeech.2019-2351},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2351}
}