Although this workshop is primarily aimed at examining changes in human speech perception, the authors submit that it may be illuminating to consider in parallel the degree to which current state-of-the-art automatic speech recognition (ASR) systems also change their behaviour over time. Therefore, for the benefit of those who are not familiar with ASR, this paper will provide a review of the computational mechanisms underlying contemporary ASR systems with a particular focus on their adaptive and learning behaviour. The paper will describe how ASR systems change dynamically in order (i) to accommodate new speakers, (ii) to handle unexpected user behaviour and (iii) to track and compensate for a constantly varying acoustic environment. A distinction will be made between 'supervised' and 'unsupervised' learning and attention will be paid to changes that occur in the acoustic model and the language model components of a typical ASR system. While many of these techniques are highly mathematical in nature, the paper will attempt to describe the underlying principles in behavioural terms in order to maximise the opportunity for inter-disciplinary exchange. Finally, it will be shown how such plastic behaviour, although valuable, is somewhat limited in its ability to allow an ASR system to operate robustly in a wide range of real-world situations. The paper will conclude by identifying potential areas where knowledge about plasticity in human speech perception might be important in the design of next-generation ASR systems and applications.
Cite as: Moore, R., Cunningham, S. (2005) Plasticity in systems for automatic speech recognition: a review. Proc. ISCA Workshop on Plasticity in Speech Perception (PSP 2005), 109-112
@inproceedings{moore05_psp, author={Roger Moore and Stuart Cunningham}, title={{Plasticity in systems for automatic speech recognition: a review}}, year=2005, booktitle={Proc. ISCA Workshop on Plasticity in Speech Perception (PSP 2005)}, pages={109--112} }