15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Utilizing State-Level Distance Vector Representation for Improved Spoken Term Detection by Text and Spoken Queries

Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai

Shizuoka University, Japan

In spoken term detection (STD) systems, approximate subword-level matching of query term and automatically transcribed spoken documents is often employed for its reasonable accuracy and efficiency. However, high out-of-vocabulary (OOV) rate often degrades the subword-level recognition accuracy and affect the STD performance. This paper describes the usage of new expanded acoustic representations of subword sequence for improved scoring between OOV query term and subword-unit transcription. Each subword is expanded in corresponding subword's HMM states and each state is represented as a new acoustic structural feature, a distribution-distance vector (DDV). The proposed DDV representation and scoring is easily combined with two typical baseline STD approaches: a DTW-based approximate matching with subword-level acoustic dissimilarity measure and a lattice-based confidence scoring of subword n-grams. The experimental result showed that the proposed DDV-based scoring method significantly outperforms the simple DTW-scoring baseline with very little increase in the required search time. The combination of the DDV-based scoring with the confidence-based scoring showed the complementary effect and attained the best STD performance compared with the NTCIR-10 SpokenDoc2(SDPWS) submitted results when only the NTCIR reference automatic transcript is used. A preliminary experiment with spoken query terms also showed that the significant improvement for OOV queries.

Full Paper

Bibliographic reference.  Makino, Mitsuaki / Yamamoto, Naoki / Kai, Atsuhiko (2014): "Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queries", In INTERSPEECH-2014, 1732-1736.