ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

PhonemeBERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Mukuntha Narayanan Sundararaman, Ayush Kumar, Jithendra Vepa

Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where ASR errors are prevalent in the transcribed text. These errors significantly degrade the performance of downstream tasks such as intent and sentiment detection. In this work, we propose a BERT-style language model, referred to as PhonemeBERT that learns a joint language model with phoneme sequence and ASR transcript to learn phonetic-aware representations that are robust to ASR errors. We show that PhonemeBERT leverages phoneme sequences as additional features that outperform word-only models on downstream tasks. We evaluate our approach extensively by generating noisy data for three benchmark datasets — Stanford Sentiment Treebank, TREC and ATIS for sentiment, question and intent classification tasks respectively in addition to a real-life sentiment dataset. The results of the proposed approach beats the state-of-the-art baselines comprehensively on each dataset. Additionally, we show that PhonemeBERT can also be utilized as a pre-trained encoder in a low-resource setup where we only have ASR-transcripts for the downstream tasks.


doi: 10.21437/Interspeech.2021-1582

Cite as: Sundararaman, M.N., Kumar, A., Vepa, J. (2021) PhonemeBERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript. Proc. Interspeech 2021, 3236-3240, doi: 10.21437/Interspeech.2021-1582

@inproceedings{sundararaman21_interspeech,
  author={Mukuntha Narayanan Sundararaman and Ayush Kumar and Jithendra Vepa},
  title={{PhonemeBERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3236--3240},
  doi={10.21437/Interspeech.2021-1582}
}