Fast vocabulary-independent audio search using path-based graph indexing

Olivier Siohan, Michiel Bacchiani

Classical audio retrieval techniques consist in transcribing audio documents using a large vocabulary speech recognition system and indexing the resulting transcripts. However, queries that are not part of the recognizer's vocabulary or have a large probability of getting misrecognized can significantly impair the performance of the retrieval system. Instead, we propose a fast vocabulary independent audio search approach that operates on phonetic lattices and is suitable for any query. However, indexing phonetic lattices so that any arbitrary phone sequence query can be processed efficiently is a challenge, as the choice of the indexing unit is unclear. We propose an inverted index structure on lattices that uses paths as indexing features. The approach is inspired by a general graph indexing method that defines an automatic procedure to select a small number of paths as indexing features, keeping the index size small while allowing fast retrieval of the lattices matching a given query. The effectiveness of the proposed approach is illustrated on broadcast news and Switchboard databases.

doi: 10.21437/Interspeech.2005-52

