ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Key-word Spotting Using Phonetic Distinctive Features Extracted from Output of an LVCSR Engine

Tsuneo Nitta, Shingo Iseji, Takashi Fukuda, Hirobumi Yamada, Kouichi Katsurada

Graduate School of Eng., Toyohashi University of Technology, Japan

In this paper, we attempt to adopt a general-purpose LVCSR engine designed for dictation as a spoken dialogue recognition system. In the proposed system, a phoneme string output from the LVCSR engine is converted into a sequence of vectors represented with distinctive features (DF), then keywords assigned by a dialogue manager are detected from the input vector sequence using dynamic time warping (DTW). The proposed system takes advantage of the potential abilities of: (1) precise phoneme discrimination achieved by relaxing the linguistic constraint in the LVCSR engine, and (2) coping with the issues of substitution, deletion and insertion errors by combining a process of conversion from a phoneme into a distinctive feature vector and a key-word spotting process. The proposed system is compared with the general-purpose LVCSR engine in an experiment with a spoken dialogue corpus of a map guidance task and shows significant improvements. Comparative studies on language models and acoustic scoring procedure in key-word detection are also discussed with sub-word model and with confusion matrix, respectively.

Full Paper

Bibliographic reference.  Nitta, Tsuneo / Iseji, Shingo / Fukuda, Takashi / Yamada, Hirobumi / Katsurada, Kouichi (2003): "Key-word spotting using phonetic distinctive features extracted from output of an LVCSR engine", in SSPR-2003, paper MAP16.