In this paper we describe the RWTH automatic speech recognition system for Slovenian developed within the transLectures project. The project aims at supporting the transcription and translation of video lectures freely available on the web. Difficulties arise on all levels of modeling: Slovenian is a morphologically rich language with a high level of inflection (pronunciation model), and a large variety of dialects and recording conditions brings uncertainty into the audio signal (acoustic model). Moreover, the video lectures cover a wide spectrum of topics with a high share of spontaneous speech and technical terms (language model). These issues require application of robust and adaptive methods. Besides the system description, this study mainly focuses on robust acoustic modeling. Building acoustic models from various resources, we also compare the influence of speaker adaptation to different neural network based acoustic features. Systematic application of these methods allows us to reduce the word error rate on the evaluation corpus from 59.2% to 43.4%. We also give a motivation for Slovenian open vocabulary recognition and perform some first steps.
Bibliographic reference. Golik, Pavel / Tüske, Zoltán / Schlüter, Ralf / Ney, Hermann (2013): "Development of the RWTH transcription system for slovenian", In INTERSPEECH-2013, 3107-3111.