ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

September 18-20, 2000
Paris, France

Towards Super-Human Speech Recognition

Mukund Padmanabhan and Michael Picheny

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

Research in speech recognition has been underway for decades, and a great deal of progress has been made in reducing the word error rate. However, recent studies still demonstrate that machine performance is still quite far from human performance across a wide variety of tasks, ranging from high-bandwidth digit recognition to large vocabulary telephony speech. In addition, for most speech recognition tasks, obtaining good performance relies on tuning to a particular domain or environment. For instance, a system trained on the Switchboard corpus is unlikely to provide close to optimal performance on a small vocabulary task such as telephone digits. As we begin to strive towards developing recognition systems that equal, or even surpass, human performance, it does not make sense to construct a system for each specific domain and environment. Consequently, our initial goal is to develop a generic speech recognition system that can deal with linguistically different, as well as acoustically different domains. In order to achieve this goal, we must combine advances in signal processing, language modeling, and acoustic modeling, with substantially enhanced training and testing data. In this paper, we outline new techniques to develop a generic system that can work on a multitude of domains and environments. We propose to train and benchmark this system using speech data from a variety of sources, representing a variety of linguistic domains, channels, and environments.


Full Paper (PDF)   Full Paper (Zipped Postscript)
Presentation (PDF)   Presentation (Zipped Postscript)  
Presentation (MS Powerpoint)

Bibliographic reference.  Padmanabhan, Mukund / Picheny, Michael (2000): "Towards super-human speech recognition", In ASR-2000, 189-194.