Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Audio Stream Phrase Recognition for a National Gallery of the Spoken Word: "One Small Step"

John H. L. Hansen, Bowen Zhou, Murat Akbacak, Ruhi Sarikaya, Bryan Pellom

The Center for Spoken Language Research; Robust Speech Processing Laboratory, University of Colorado at Boulder, Boulder, CO, USA

In this paper, we introduce the problem of audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). This will be the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20th Century. We propose a system diagram and discuss critical processing tasks such as: an environment classifier, recognizer model adaptation for acoustic background noise, restricted channels, and speaker variability, natural language processor, and speech enhancement/feature processing. A probe NGSW data set is used to perform experiments using SPHINX-III LVCSR and a previously formulated RSPL-keyword spotting system. Results are reported for WSJ, BN, and NGSW corpora. Results from sub-system evaluations are reported for (i) model adaptation based on mixture weight adjustment with MLLR (reduces WER by 2.6% over a baseline BN trained model), speaker and environmental turn taking using a Bayesian Information Criterion (BIC), and statistical analysis of phrase recognition performance for confidence measure scoring. Finally, we discuss a number of research challenges needed to address the overall task of robust phrase searching in unrestricted corpora.

