Segmented Dynamic Time Warping for Spoken Query-by-Example Search

Jorge Proença, Fernando Perdigão


This paper describes a low-resource approach to a Query-by-Example task, where spoken queries must be matched in a large dataset of spoken documents sometimes in complex or non-exact ways. Our approach tackles these complex match cases by using Dynamic Time Warping to obtain alternative paths that account for reordering of words, small extra content and small lexical variations. We also report certain advances on calibration and fusion of sub-systems that improve overall results, such as manipulating the score distribution per query and using an average posteriorgram distance matrix as an extra sub-system. Results are evaluated on the MediaEval task of Query-by-Example Search on Speech (QUESST). For this task, the language of the audio being searched is almost irrelevant, approaching the use case scenario to a language of very low resources. For that, we use as features the posterior probabilities obtained from five phonetic recognizers trained with five different languages.


DOI: 10.21437/Interspeech.2016-1276

Cite as

Proença, J., Perdigão, F. (2016) Segmented Dynamic Time Warping for Spoken Query-by-Example Search. Proc. Interspeech 2016, 750-754.

Bibtex
@inproceedings{Proença+2016,
author={Jorge Proença and Fernando Perdigão},
title={Segmented Dynamic Time Warping for Spoken Query-by-Example Search},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1276},
url={http://dx.doi.org/10.21437/Interspeech.2016-1276},
pages={750--754}
}