Detection of speech attributes, phones and words is a key component of a detection-based automatic speech recognition framework in the automatic speech attribute transcription project. This paper presents a two-stage approach, keyword-filler network method followed by knowledge-based pruning and rescoring, for detection of any given word in continuous speech. Different from conventional keyword spotting systems, both content words and function words are considered in this study. To reduce the high miss, a modified grammar network for word detection is proposed. Then knowledge sources from landmark detection, attributes detection and other spectral cues were combined together to remove the unlikely putative segments from the hypothesized word candidates. This study has been evaluated on the WSJ0 corpus under matched and mismatched acoustic conditions. When comparing with the conventional keyword spotting system, we found the proposed word detector greatly improves the detection performance. The figure-of-merits for content and function words were improved from 48.8% to 61.5%, and 22.3% to 33.1% respectively.
Bibliographic reference. Ma, Chengyuan / Lee, Chin-Hui (2007): "A study on word detector design and knowledge-based pruning and rescoring", In INTERSPEECH-2007, 1473-1476.