EUROSPEECH 2001 Scandinavia
In this paper we evaluate the performance of the ISL's German Verbmobil spontaneous speech recognizer on the Nespole! database. In this task, people talk to an agent in a tourist office to plan their holidays via a NetMeeting connection, also sharing screen contents (web-pages). Stereo recordings were made both before and after speech transmission over an IP connection using the G.711 codec, so that we are able to directly measure the loss in LVCSR performance due to NetMeeting's segmentation and compression. The aim of this work is to quantify this loss, which is a consequence of using protocols which were not designed for speech recognition purposes. We report on techniques employed to port our existing clean-speech recognizer to this new data quality, using about 1.5h of labeled adaptation data, but avoiding a complete retraining of the system.
Bibliographic reference. Metze, Florian / McDonough, John / Soltau, Hagen (2001): "Speech recognition over netmeeting connections", In EUROSPEECH-2001, 2389-2392.