ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
In this paper, we attempt to adopt a general-purpose LVCSR engine designed for dictation as a spoken dialogue recognition system. In the proposed system, a phoneme string output from the LVCSR engine is converted into a sequence of vectors represented with distinctive features (DF), then keywords assigned by a dialogue manager are detected from the input vector sequence using dynamic time warping (DTW). The proposed system takes advantage of the potential abilities of: (1) precise phoneme discrimination achieved by relaxing the linguistic constraint in the LVCSR engine, and (2) coping with the issues of substitution, deletion and insertion errors by combining a process of conversion from a phoneme into a distinctive feature vector and a key-word spotting process. The proposed system is compared with the general-purpose LVCSR engine in an experiment with a spoken dialogue corpus of a map guidance task and shows significant improvements. Comparative studies on language models and acoustic scoring procedure in key-word detection are also discussed with sub-word model and with confusion matrix, respectively.
Bibliographic reference. Nitta, Tsuneo / Iseji, Shingo / Fukuda, Takashi / Yamada, Hirobumi / Katsurada, Kouichi (2003): "Key-word spotting using phonetic distinctive features extracted from output of an LVCSR engine", in SSPR-2003, paper MAP16.