ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection

Sha Meng, Jian Shao, Roger Peng Yu, Jia Liu, Frank Seide

While the Out-Of-Vocabulary (OOV) problem remains a challenge for English spoken term detection tasks, it is under-estimated for Chinese. This is because an Chinese OOV query term can still be matched as a sequence of Chinese characters, with each character itself being a word in the vocabulary. However, our experiments show that search accuracy levels differ significantly when a query is or is not in the vocabulary. In-Vocabulary (INV) queries outperform OOV queries for more than 20%. We examine this problem with a word-lattice-based spoken term detection task. We propose a two-stage method by first locating candidates by partial phonetic matching and then refining the matching score with word lattice rescoring. Experiments show that the proposed method achieves a 24.1% relative improvement for OOV queries on a large-scale Chinese spoken term detection task.


doi: 10.21437/Interspeech.2008-562

Cite as: Meng, S., Shao, J., Yu, R.P., Liu, J., Seide, F. (2008) Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection. Proc. Interspeech 2008, 2146-2149, doi: 10.21437/Interspeech.2008-562

@inproceedings{meng08_interspeech,
  author={Sha Meng and Jian Shao and Roger Peng Yu and Jia Liu and Frank Seide},
  title={{Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2146--2149},
  doi={10.21437/Interspeech.2008-562}
}