ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

Exploring pronunciation variants for Romanian speech-to-text transcription

Ioana Vasilescu, Bianca Vieru, Lori Lamel

Speech processing tools were applied to investigate morpho-phonetic trends in contemporary spoken Romanian, with the objective of improving the pronunciation dictionary and more generally, the acoustic models of a speech recognition system. As no manually transcribed audio data were available for training, language models were estimated on a large text corpus and used to provide indirect supervision to train acoustic models in a semi-supervised manner. Automatic transcription errors were analyzed in order to gain insights into language specific features for both improving the current performance of the system and to explore linguistic issues. Two aspects of the Romanian morpho-phonology were investigated based on this analysis: the deletion of the masculine definite article -l and the secondary palatalization of plural nouns and adjectives and of 2nd person indicative of verbs.

Index Terms: ASR, Romanian, speech transcription errors, pronunciation variants, definite article, palatalization


Cite as: Vasilescu, I., Vieru, B., Lamel, L. (2014) Exploring pronunciation variants for Romanian speech-to-text transcription. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 161-168

@inproceedings{vasilescu14_sltu,
  author={Ioana Vasilescu and Bianca Vieru and Lori Lamel},
  title={{Exploring pronunciation variants for Romanian speech-to-text transcription}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={161--168}
}