ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Audio stream phrase recognition for a national gallery of the spoken word: "one small step"

John H. L. Hansen, Bowen Zhou, Murat Akbacak, Ruhi Sarikaya, Bryan Pellom

In this paper, we introduce the problem of audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). This will be the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20th Century. We propose a system diagram and discuss critical processing tasks such as: an environment classifier, recognizer model adaptation for acoustic background noise, restricted channels, and speaker variability, natural language processor, and speech enhancement/feature processing. A probe NGSW data set is used to perform experiments using SPHINX-III LVCSR and a previously formulated RSPL-keyword spotting system. Results are reported for WSJ, BN, and NGSW corpora. Results from sub-system evaluations are reported for (i) model adaptation based on mixture weight adjustment with MLLR (reduces WER by 2.6% over a baseline BN trained model), speaker and environmental turn taking using a Bayesian Information Criterion (BIC), and statistical analysis of phrase recognition performance for confidence measure scoring. Finally, we discuss a number of research challenges needed to address the overall task of robust phrase searching in unrestricted corpora.


Cite as: Hansen, J.H.L., Zhou, B., Akbacak, M., Sarikaya, R., Pellom, B. (2000) Audio stream phrase recognition for a national gallery of the spoken word: "one small step". Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 1089-1092

@inproceedings{hansen00b_icslp,
  author={John H. L. Hansen and Bowen Zhou and Murat Akbacak and Ruhi Sarikaya and Bryan Pellom},
  title={{Audio stream phrase recognition for a national gallery of the spoken word: "one small step"}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 1089-1092}
}