EUROSPEECH 2003 - INTERSPEECH 2003
This paper describes recent advances we have made towards the goal of empowering end users to automatically expand the knowledge base of a dialogue system through spoken interaction, in order to personalize it to their individual needs. We describe techniques used to incrementally reconfigure a preloaded trained natural language grammar, as well as the lexicon and language models for the speech recognition system. We also report on advances in the technology to integrate a spoken pronunciation with a spoken spelling, in order to improve spelling accuracy. While the original algorithm was designed for a "speak and spell" input mode, we have shown here that the same methods can be applied to separately uttered spoken and spelled forms of the word. By concatenating the two waveforms, we can take advantage of the mutual constraints realized in an integrated composite FST. Using an OGI corpus of separately spoken and spelled names, we have demonstrated letter error rates of under 6% for in-vocabulary words and under 11% for words not contained in the training lexicon, a 44% reduction in error rate over that achieved without use of the spoken form. We anticipate applying this technique to unknown words embedded in a larger context, followed by solicited spellings.
Bibliographic reference. Seneff, Stephanie / Chung, Grace / Wang, Chao (2003): "Empowering end users to personalize dialogue systems through spoken interaction", In EUROSPEECH-2003, 749-752.