Assessment of automatic speech recognition systems in adverse (noise) conditions can be performed with various databases ranging from (hard-to-control) representative conditions to (carefully controlled) artificial conditions. In carrying out such assessments some of the most important "parameters" to be considered are:
(1) Application oriented vocabulary versus more diagnostic vocabulary.
(2) Spontaneous speech versus read speech.
(3) Recording under representative noise conditions (e.g. stimulating the Lombard effect, but with a limited number of controlled signal-to-noise ratios) versus mixed additive noise (at well defined levels).
It is interesting to consider the performance of different recognizers under various conditions and the extent to which performance in one domain, for example, mixed additive noise can be used to predict performance for use in representative noise conditions.
The latter goal in particular is considered by various groups (NATO RSG.10, ESPRIT-SAM) which have designed and scheduled experiments where some of these "parameters" will be compared.
This paper describes experiments focused on vocabulary comparison and noise addition. The vocabularies include: digits, cockpit-control words and CVC words (Consonant-Vowel-Consonant). The noise conditions were obtained in a laboratory high-noise room or by adding noise artificially. The effect of spontaneous speech on the recognition performance is not included in this study.
Cite as: Steeneken, H.J.M., Varga, A. (1992) Comparison of assessment methods for automatic speech recognition in noise conditions. Proc. ETRW on Speech Processing in Adverse Conditions, 73-76
@inproceedings{steeneken92b_spac, author={Herman J. M. Steeneken and Andrew Varga}, title={{Comparison of assessment methods for automatic speech recognition in noise conditions}}, year=1992, booktitle={Proc. ETRW on Speech Processing in Adverse Conditions}, pages={73--76} }