In this paper we present a first approach to access African oral corpora, combining automatic speech recognition and information retrieval. Firstly, we present the principal characteristics of our Somali speech recognizer  and the results obtained on real audio archives gathered from Djibouti Radio. Secondly, we present a Hybrid Language Model (HLM) including words and sub-words to improve the robustness against OOV words. We proceed to Information Retrieval experiments with various strategies. We search on the different outputs of the ASR system (words, sub-words and hybrid). We finally present a new strategy combining sub-words and words to enhance the information retrieval results.
Bibliographic reference. Nimaan, Abdillahi / Nocera, Pascal / Béchet, Frédéric / Bonastre, Jean-François (2007): "Information retrieval strategies for accessing african audio corpora", In INTERSPEECH-2007, 1545-1548.