We propose an unsupervised technique to model the spoken query using hidden Markov model (HMM) for spoken term detection without speech recognition. By unsupervised segmentation, clustering and training, a set of HMMs, referred to as acoustic segment HMMs (ASHMMs), is generated from the spoken archive to model the signal variations and frame trajectories. An unsupervised technique is also designed for ASHMMs parameter training. A model-based approach for spoken term detection is then developed by constructing a query HMM from the ASHMMs, and then scoring the spoken documents using the query HMM. Experiments show that this model-based approach complements the feature-based dynamic time warping approach. A significant improvement on detection performance is achieved by integrating the two methods.
Bibliographic reference. Chan, Chun-an / Lee, Lin-shan (2011): "Unsupervised hidden Markov modeling of spoken queries for spoken term detection without speech recognition", In INTERSPEECH-2011, 2141-2144.