ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

Recent improvements in Estonian LVCSR

Tanel Alumäe

This paper describes our current automatic transcription system for Estonian semi-spontaneous speech that we are developing within the Estonian language technology national program. A three pass decoding strategy is employed, with speaker-independent GMM acoustic models used in the first pass and speaker-adapted DNN-HMM models in the last pass. A neural network based phone duration model is used to rescore recognition lattices after the final pass and is found to give a surprisingly large gain in recognition accuracy. Compound words are split before building a statistical language model, and reconstructed from recognized hypotheses using an n-gram model. The word error rate of our system is 17.9% on broadcast conversations and 26.3% on conference speeches. This is around 8% absolute (24-30% relative) improvement compared to a GMM-based system of 2012.

Index Terms: Speech recognition, LVCSR, DNN, duration model, Estonian


Cite as: Alumäe, T. (2014) Recent improvements in Estonian LVCSR. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 118-123

@inproceedings{alumae14_sltu,
  author={Tanel Alumäe},
  title={{Recent improvements in Estonian LVCSR}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={118--123}
}