Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach

György Szaszák, Máté Ákos Tündik


Punctuating ASR transcript has received increasing attention recently, and well-performing approaches were presented based on sequence-to-sequence modelling, exploiting textual (word and character) and/or acoustic-prosodic features. In this work we propose to consider character, word and prosody based features all at once to provide a robust and highly language independent platform for punctuation recovery, which can deal also well with highly agglutinating languages with less constrained word order. We demonstrate that using such a feature triplet improves ASR error robustness of punctuation in two quite differently organized languages, English and Hungarian. Moreover, in the highly agglutinating Hungarian, where word-based approaches suffer from the exploding vocabulary (poorer semantic representation through embeddings) and less constrained word order, we show that prosodic cues and the character-based model can powerfully counteract this loss of information. We also perform a deep analysis of punctuation w.r.t. both ASR errors and agglutination to explain the improvements we observed on a solid basis.


 DOI: 10.21437/Interspeech.2019-2132

Cite as: Szaszák, G., Tündik, M.Á. (2019) Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach. Proc. Interspeech 2019, 2988-2992, DOI: 10.21437/Interspeech.2019-2132.


@inproceedings{Szaszák2019,
  author={György Szaszák and Máté Ákos Tündik},
  title={{Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2988--2992},
  doi={10.21437/Interspeech.2019-2132},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2132}
}