Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)
St. Petersburg, Russia
Development of an automatic speech recognition (ASR) system for low-resourced languages is an important research topic in ASR. This paper reports on the development of a speech-to-text (STT) system targeting broadcast news and broadcast conversation transcription for the low-resourced Slovak language. Context-dependent acoustic models are trained without any manually transcribed audio data via cross-language transfer and unsupervised training. In addition, a pronunciation dictionary for Slovak language is created using efficient rule-based pronunciation modeling. For language modeling, large N-gram language models were estimated on 63M words of texts downloaded from the Internet. The system uses MLP (multilayer perceptron) features imported from English which are concatenated with cepstral PLP (perceptual linear prediction) and F0 (pitch) features. These techniques were applied to develop a Slovak STT system with performance similar to that obtained by state-of-the-art systems for other languages. Furthermore, we propose to reduce the dimension of the MLP+PLP+F0 features from 81 to 50, using principal component analysis (PCA), in order to reduce the redundancy between the MLP and the PLP+F0 features. This feature reduction makes it possible to reduce the word error rate (WER) and the recognition time while reducing the CMLLR adaptation time by a factor of 3.
Index Terms: Slovak speech-to-text, ASR for low-resourced languages, Multi-layer perceptron, Unsupervised acoustic model training, Principal component analysis
Bibliographic reference. Do, Cong-Thanh / Lamel, Lori / Gauvain, Jean-Luc (2014): "Speech-to-text development for Slovak, a low-resourced language", In SLTU-2014, 176-182.