A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery

Anna Moró, György Szaszák


For the automatic punctuation of Automatic Speech Recognition (ASR) output, both prosodic and text based features are used, often in combination. Pure prosody based approaches usually have low computation needs, introduce little latency (delay) and they are also more robust to ASR errors. Text based approaches usually yield better performance, they are however resource demanding (both regarding their training and computational needs), often introduce high time latency and are more sensitive to ASR errors. The present paper proposes a lightweight prosody based punctuation approach following a new paradigm: we argue in favour of an all-inclusive modelling of speech prosody instead of just relying on distinct acoustic markers: first, the entire phonological phrase structure is reconstructed, then its close correlation with punctuations is exploited in a sequence modelling approach with recurrent neural networks. With this tiny and easy to implement model we reach performance in Hungarian punctuation comparable to large, text based models for other languages by keeping resource requirements minimal and suitable for real-time operation with low latency.


 DOI: 10.21437/Interspeech.2017-204

Cite as: Moró, A., Szaszák, G. (2017) A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery. Proc. Interspeech 2017, 558-562, DOI: 10.21437/Interspeech.2017-204.


@inproceedings{Moró2017,
  author={Anna Moró and György Szaszák},
  title={A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={558--562},
  doi={10.21437/Interspeech.2017-204},
  url={http://dx.doi.org/10.21437/Interspeech.2017-204}
}