14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

A Resource-Dependent Approach to Word Modeling for Keyword Spotting

I-Fan Chen, Chin-Hui Lee

Georgia Institute of Technology, USA

A hierarchical framework is proposed to address the issues of modeling different type of words in keyword spotting (KWS). Keyword models are built at various levels according to the availability of training set resources for each individual word. The proposed approach improves the performance of KWS even when no training speech is available for the keywords. It also suggests an easier way to collect training data for these resource-limited words. Experimental results show that the proposed framework improves performance in KWS in a figure-of-merit (FOM) metric regardless of the number of training instances for each keyword. For words with abundant speech data, the proposed method exploits the training data better than the conventional modeling technique and boosts the system FOM from 9.79% to 42.78%. For words with a small amount of training data, the new method increases the system FOM from 29.05% to 49.06%. Even for keywords without any training examples, the new modeling scheme improves the system FOM from 60.96% to 66.51%.

Full Paper

Bibliographic reference.  Chen, I-Fan / Lee, Chin-Hui (2013): "A resource-dependent approach to word modeling for keyword spotting", In INTERSPEECH-2013, 2544-2548.