8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Information Retrieval Strategies for Accessing African Audio Corpora

Abdillahi Nimaan, Pascal Nocera, Frédéric Béchet, Jean-François Bonastre

LIA, France

In this paper we present a first approach to access African oral corpora, combining automatic speech recognition and information retrieval. Firstly, we present the principal characteristics of our Somali speech recognizer [8] and the results obtained on real audio archives gathered from Djibouti Radio. Secondly, we present a Hybrid Language Model (HLM) including words and sub-words to improve the robustness against OOV words. We proceed to Information Retrieval experiments with various strategies. We search on the different outputs of the ASR system (words, sub-words and hybrid). We finally present a new strategy combining sub-words and words to enhance the information retrieval results.

Full Paper

Bibliographic reference.  Nimaan, Abdillahi / Nocera, Pascal / Béchet, Frédéric / Bonastre, Jean-François (2007): "Information retrieval strategies for accessing african audio corpora", In INTERSPEECH-2007, 1545-1548.