While the Out-Of-Vocabulary (OOV) problem remains a challenge for English spoken term detection tasks, it is under-estimated for Chinese. This is because an Chinese OOV query term can still be matched as a sequence of Chinese characters, with each character itself being a word in the vocabulary. However, our experiments show that search accuracy levels differ significantly when a query is or is not in the vocabulary. In-Vocabulary (INV) queries outperform OOV queries for more than 20%. We examine this problem with a word-lattice-based spoken term detection task. We propose a two-stage method by first locating candidates by partial phonetic matching and then refining the matching score with word lattice rescoring. Experiments show that the proposed method achieves a 24.1% relative improvement for OOV queries on a large-scale Chinese spoken term detection task.
Bibliographic reference. Meng, Sha / Shao, Jian / Yu, Roger Peng / Liu, Jia / Seide, Frank (2008): "Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection", In INTERSPEECH-2008, 2146-2149.