5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Speech Recognition Performance on a new Voicemail Transcription Task

Mukund Padmanabhan, Bhuvana Ramabhadran, Sankar Basu

IBM T. J. Watson Research Center, USA

In this paper we describe a new testbed for developing speech recognition algorithms - a VoiceMail transcription task, analogous to other tasks such as the Switchboard, CallHome, and the Hub 4 tasks, which are currently used by speech recognition researchers. We describe (i) the use of compound words to model co-articulation effects in commonly occurring words (ii) the use of linguistically derived phonological (that model phenomena such as degemination, palatization etc) for other words (iii) a new model-complexity adaptation technique that uses a discriminant measure to allocate gaussians to the mixtures modelling the acoustic units (allophones) (iv) experiments in using different feature extraction methods (v) we also investigated the efficacy of some well known acoustic adaptation techniques on this task. We then reported experimental results that showed that most of the modelling techniques we investigated were useful in reducing the word error rate - from 87% (when decoded with Switchboard acoustic and language models) to 38%.

Full Paper

Bibliographic reference.  Padmanabhan, Mukund / Ramabhadran, Bhuvana / Basu, Sankar (1998): "Speech recognition performance on a new voicemail transcription task", In ICSLP-1998, paper 0210.