Transcription Correction for Indian Languages Using Acoustic Signatures

Jeena JPrakash, Golda Brunet Rajan, Hema Murthy

Accurate phonetic transcription of the speech corpus has a significant impact on the performance of speech processing applications especially for low resource languages. Mismatches between the transcriptions and their utterances occur often at phoneme level due to insertion/deletion/substitution errors. This is very common in Indian languages owing to schwa deletion in the context of vowels and agglutination in the context of consonants. An attempt is made in this paper to use acoustic cues at the syllable level to remove vowels from the transcription when they are poorly articulated or absent. Hidden Markov model (HMM) based forced Viterbi alignment (FVA) and group delay (GD) based signal processing are employed in tandem to achieve this task. Disagreement between FVA (which produces vowel boundaries based on transcription) and GD boundaries (which uses signal processing cues for syllables) are used to correct the transcription. An increase in likelihood of 0.3% is observed across 3 Indian languages, namely, Gujarati, Telugu and Tamil.

 DOI: 10.21437/Interspeech.2018-1188

Cite as: JPrakash, J., Rajan, G.B., Murthy, H. (2018) Transcription Correction for Indian Languages Using Acoustic Signatures. Proc. Interspeech 2018, 3177-3181, DOI: 10.21437/Interspeech.2018-1188.

  author={Jeena JPrakash and Golda Brunet Rajan and Hema Murthy},
  title={Transcription Correction for Indian Languages Using Acoustic Signatures},
  booktitle={Proc. Interspeech 2018},