Interactive Voice Technology for Telecommunications Applications (IVTTA'98)

Torino, Italy
September 29-30, 1998

Connected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers

Piero Cosi (1), John-Paul Hosom (2), Johan Shalkwyk (2), Stephen Sutton (2), Ronald A. Cole (2)

(1) Institute of Phonetics - C.N.R., Padova, Italy
(2) Center for Spoken Language Understanding (CSLU), Oregon Graduate Institute of Science and Technology (OGI), Portland, Oregon, USA

This paper describes a series of experiments that compare different approaches to training a speaker-independent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development software environment that provides a powerful and flexible tool for creating and using spoken language systems for telephone and PC applications. In particular, the CSLU-HMM, the CSLU-NN, and the CSLU-FBNN development environments, with which our experiments were implemented, will be described in detail and recognition results will be compared. Our speech corpus is OGI 30K-Numbers, which is a collection of spontaneous ordinal and cardinal numbers, continuous digit strings and isolated digit strings. The utterances were recorded by having a large number of people recite their ZIP code, street address, or other numeric information over the telephone. This corpus represents a very noisy and difficult recognition task. Our best results (98% word recognition, 92% sentence recognition), obtained with the FBNN architecture, suggest the effectiveness of the CSLU Toolkit in building real-life speech recognition systems.

Full Paper

Bibliographic reference.  Cosi, Piero / Hosom, John-Paul / Shalkwyk, Johan / Sutton, Stephen / Cole, Ronald A. (1998): "Connected digit recognition experiments with the OGI toolkit's neural network and HMM-based recognizers", In IVTTA'98, 135-140.