Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Cheung-Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li


This paper documents the significant components of a state-of-the-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our post-evaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.


DOI: 10.21437/Interspeech.2016-691

Cite as

Leung, C., Wang, L., Xu, H., Hou, J., Pham, V.T., Lv, H., Xie, L., Xiao, X., Ni, C., Ma, B., Chng, E.S., Li, H. (2016) Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis. Proc. Interspeech 2016, 3703-3707.

Bibtex
@inproceedings{Leung+2016,
author={Cheung-Chi Leung and Lei Wang and Haihua Xu and Jingyong Hou and Van Tung Pham and Hang Lv and Lei Xie and Xiong Xiao and Chongjia Ni and Bin Ma and Eng Siong Chng and Haizhou Li},
title={Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-691},
url={http://dx.doi.org/10.21437/Interspeech.2016-691},
pages={3703--3707}
}