An RNN Model of Text Normalization

Richard Sproat, Navdeep Jaitly


We present a recurrent neural net (RNN) model of text normalization — defined as the mapping of written text to its spoken form, and a description of the open-source dataset that we used in our experiments. We show that while the RNN model achieves very high overall accuracies, there remain errors that would be unacceptable in a speech application like TTS. We then show that a simple FST-based filter can help mitigate those errors. Even with that mitigation challenges remain, and we end the paper outlining some possible solutions. In releasing our data we are thereby inviting others to help solve this problem.


 DOI: 10.21437/Interspeech.2017-35

Cite as: Sproat, R., Jaitly, N. (2017) An RNN Model of Text Normalization. Proc. Interspeech 2017, 754-758, DOI: 10.21437/Interspeech.2017-35.


@inproceedings{Sproat2017,
  author={Richard Sproat and Navdeep Jaitly},
  title={An RNN Model of Text Normalization},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={754--758},
  doi={10.21437/Interspeech.2017-35},
  url={http://dx.doi.org/10.21437/Interspeech.2017-35}
}