Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)

St. Petersburg, Russia
May 14-16, 2014

Towards Automatic Speech Recognition Without Pronunciation Dictionary, Transcribed Speech and Text Resources in the Target Language Using Cross-Lingual Word-to-Phoneme Alignment

Felix Stahlberg (1), Tim Schlippe (1), Stephan Vogel (2), Tanja Schultz (1)

(1) Cognitive Systems Lab, Karlsruhe Institute of Technology (KIT), Germany
(2) Qatar Computing Research Institute, Qatar Foundation, Qatar

In this paper we tackle the task of bootstrapping an Automatic Speech Recognition system without an a priori given language model, a pronunciation dictionary, or transcribed speech data for the target language Slovene – only untranscribed speech and translations to other resource-rich source languages of what was said are available. Therefore, our approach is highly relevant for under-resourced and non-written languages. First, we borrow acoustic models from a strongly related language (Croatian) and apply a Croatian phoneme recognizer to the Slovene speech. Second, we segment the recognized phoneme strings into word units using cross-lingual word-tophoneme alignment. Third, we compensate for phoneme recognition and alignment errors in the segmented phoneme sequences and aggregate the resulting phoneme sequence segments in a pronunciation dictionary for Slovene. Orthographic representations are generated using a Croatian phoneme-to-grapheme model. Finally, we use the resulting dictionary and the Croatian acoustic models to recognize Slovene. Our best recognizer achieves a Character Error Rate of 52% on the BMED corpus.

Index Terms: pronunciation dictionary, non-written languages, word-to-phoneme alignment, language discovery, zeroresource

Full Paper

Bibliographic reference.  Stahlberg, Felix / Schlippe, Tim / Vogel, Stephan / Schultz, Tanja (2014): "Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment", In SLTU-2014, 73-80.