EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

A Comparison of the Data Requirements of Automatic Speech Recognition Systems and Human Listeners

Roger K. Moore

20/20 Speech Ltd., U.K.

Since the introduction of hidden Markov modelling there has been an increasing emphasis on data-driven approaches to automatic speech recognition. This derives from the fact that systems trained on substantial corpora readily outperform those that rely on more phonetic or linguistic priors. Similarly, extra training data almost always results in a reduction in word error rate - "there's no data like more data". However, despite this progress, contemporary systems are not able to fulfill the requirements demanded by many potential applications, and performance is still significantly short of the capabilities exhibited by human listeners. For these reasons, the R&D community continues to call for even greater quantities of data in order to train their systems. This paper addresses the issue of just how much data might be required in order to bring the performance of an automatic speech recognition system up to that of a human listener.

Full Paper

Bibliographic reference.  Moore, Roger K. (2003): "A comparison of the data requirements of automatic speech recognition systems and human listeners", In EUROSPEECH-2003, 2581-2584.