In this paper, the application of semi-supervised manifold learning techniques to the task of verifying hypothesized occurrences of spoken terms is investigated. These techniques are applied in a two stage spoken term detection framework where ASR lattices are first generated using a large vocabulary ASR system and hypothesized occurrences of spoken query terms in the lattices are verified in a second stage. The verification process is performed using a fixed dimensional feature representation derived from each hypothesized term occurrence. Two semi-supervised approaches namely, manifold regularized least squares (RLS) classification and spectral clustering, are investigated for distinguishing correct hypotheses from false alarms. It is shown that, exploiting unlabeled data in addition to labeled data using semi-supervised approaches, significantly improves the verification performance compared to the case where only the labeled data is used. This improvement in performance increases as the ratio of unlabeled to labeled data augments. It is also shown that, when training data is very limited, a comparable verification performance can be gained by exploiting only the acoustic similarity between the test samples using the spectral clustering approach.
Bibliographic reference. Norouzian, Atta / Rose, Richard C. / Jansen, Aren (2013): "Semi-supervised manifold learning approaches for spoken term verification", In INTERSPEECH-2013, 2594-2598.