Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples

Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li

Rescoring hypothesized detections, using keyword’s audio samples extracted from training data, is an effective way to improve the performance of a Keyword Search (KWS) system. Unfortunately such rescoring framework cannot be applied directly to Out-of-Vocabulary (OOV) keywords since there is no sample in the training data. To address this limitation, we propose two techniques for OOV keywords in this work. The first technique generates samples for an OOV keyword by concatenating samples of its constituent subwords. The second technique splits hypothesized detections into segments, then estimates the acoustic similarities between detections and subword’s samples according to the similarities between segments and these samples. The similarity scores from these two techniques are used to rescore and re-rank the list of detections returned by the automatic speech recognition (ASR) systems. The experiments show that incorporating the proposed similarity scores results in a better separation between the correct and false alarm detections than using the ASR scores alone. Furthermore, experimental results on the NIST OpenKWS15 Evaluation show that rescoring with the proposed similarity scores significantly outperforms the raw ASR scores, and other methods that do not use the similarity scores, in both Maximum Term Weighted Value (MTWV) and Mean Average Precision (MAP) metrics.

DOI: 10.21437/Interspeech.2016-646

Cite as

Pham, V.T., Xu, H., Xiao, X., Chen, N.F., Chng, E.S., Li, H. (2016) Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples. Proc. Interspeech 2016, 933-937.

author={Van Tung Pham and Haihua Xu and Xiong Xiao and Nancy F. Chen and Eng Siong Chng and Haizhou Li},
title={Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples},
booktitle={Interspeech 2016},