Is it possible to use out-of-domain acoustic training data to improve a speech recognizer's performance on a specific, independent application? In our experiments, we use Wallstreet Journal (WSJ) data to train a recognizer, which is adapted and evaluated in the Phonebook domain. Apart from their common language (US English), the two corpora dier in many important respects: microphone vs. telephone channel, continuous speech vs. isolated words, mismatch in speaking rate.
This paper deals with two questions. First, starting from the WSJ-trained recognizer, how much adaptation data (taken from the Phonebook training corpus) is necessary to achieve a reasonable recognition performance in spite of the high degree of mismatch? Second, is it possible to improve the recognition performance of a Phonebook-trained baseline acoustic model by using additional out-of-domain training data? The paper describes the adaptation and normalization techniques used to bridge the mismatch between the two corpora.
Cite as: Blasig, R., Rose, G., Meyer, C. (2000) Training of isolated word recognizers with continuous speech. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 449-452
@inproceedings{blasig00_icslp, author={Reinhard Blasig and Georg Rose and Carsten Meyer}, title={{Training of isolated word recognizers with continuous speech}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 449-452} }