9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Spoken Document Retrieval by Translating Recognition Candidates into Correct Transcriptions

Tomoyosi Akiba, Yusuke Yokota

Toyohashi University of Technology, Japan

This paper proposes an ad hoc retrieval method for spoken documents that uses a statistical translation technique. After transcribing the spoken documents by using a Large-Vocabulary Continuous Speech Recognition (LVCSR) decoder, a text-based ad hoc retrieval method can be directly applied to the transcribed documents. However, recognition errors will significantly degrade the retrieval performance. In particular, because words that are Out-Of-Vocabulary (OOV) for the recognition dictionary of the LVCSR decoder will not appear in the transcribed text, a query constructed from such words will never match any document in the target collection. To address such problems, the proposed method aims to fill the gap between the automatically transcribed text and the correctly transcribed text by using a statistical translation technique. Experimental evaluation shows that the proposed method performs better than the baseline ad hoc retrieval method using only the transcribed text, especially for retrieval tasks with relatively small target documents.

Full Paper

Bibliographic reference.  Akiba, Tomoyosi / Yokota, Yusuke (2008): "Spoken document retrieval by translating recognition candidates into correct transcriptions", In INTERSPEECH-2008, 2166-2169.