INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Cross-Lingual Spoken Content Search System

Jitendra Ajmera, Ashish Verma

IBM Research - India, India

This paper presents an approach towards enabling audio search for those languages where training an automatic speech recognition (ASR) system is difficult, owing to lack of training resources. Our work is related to previous approaches where the problem of allowing search for out-of-vocabulary terms has been addressed. A phonetic recognizer is used to convert the audio data into phonetic lattices. In the proposed approach, the acoustic models (AM) for the phonetic recognizer are trained on a base language for which training data is available and used to search the content in a similar language. A phonetic language model (PLM) is trained for each language independently using text data available from a variety of sources including the web. We have performed experiments to evaluate this approach for searching through Gujarati corpus where the AM were trained on Indian-English corpus. The experimental results show that this approach can provide a P@10 (precision at 10) accuracy of up to 0.65.

Full Paper

Bibliographic reference.  Ajmera, Jitendra / Verma, Ashish (2011): "A cross-lingual spoken content search system", In INTERSPEECH-2011, 2257-2260.