Sixth International Conference on Spoken Language Processing
Automatic recognition of continuously-spoken digits (e.g., telephone numbers or credit card numbers) is feasible with excellent accuracy, even for speaker-independent applications over telephone lines. However, even such relatively simple recognition tasks suffer decreased performance in adverse conditions, such as significant background noise or fading on portable telephone channels. If an application further imposes significant limitations on the computing resources for the recognition task, then robust limited-resource speech recognition remains a suitable challenge, even for a vocabulary as simple as the digits. Since connected-digit recognition over telephone lines is a very practical application, the amount of computer resources needed for a given level of recognition accuracy was investigated for different acoustic noise conditions. Rather than use a traditional hidden Markov model approach with cepstral analysis, which is computationally intensive and does not always work well under adverse acoustic conditions, simpler spectral analysis was used, combined with a segmental approach. The limited nature of the vocabulary (i.e., 10 digits) allows this simpler approach. High recognition accuracy can be maintained despite a large decrease (vs. traditional methods) in both memory and computation.
Bibliographic reference. O'Shaughnessy, Douglas / Gabrea, Marcel (2000): "Recognition of digit strings in noisy speech with limited resources", In ICSLP-2000, vol.3, 554-557.