We previously proposed a fast spoken term detection method that uses a suffix array data structure for searching large-scale speech documents. The method reduces search time via techniques such as keyword division and iterative lengthening search. In this paper, we propose a statistical method of assigning different threshold values to sub-keywords to further accelerate search. Specifically, the method estimates the numbers of results for keyword searches and then reduces them by adjusting the threshold values assigned to sub-keywords. We also investigate the theoretical condition that must be satisfied by these threshold values. Experiments show that the proposed search method is 10% to 30% faster than previous methods.
Bibliographic reference. Katsurada, Kouichi / Miura, Seiichi / Seng, Kheang / Iribe, Yurie / Nitta, Tsuneo (2013): "Acceleration of spoken term detection using a suffix array by assigning optimal threshold values to sub-keywords", In INTERSPEECH-2013, 11-14.