7th International Conference on Spoken Language Processing
September 16-20, 2002
It has repeatedly been shown, mostly in terms of WER, that the rate of speech significantly affects speech recognition accuracy. However, the question how is not yet satisfactorily answered. In this paper we scrutinized in which way already modeling accuracy is influenced by the rate of speech. We observed the existence of a rather direct (negative) correlation between the local speech rate (LSR) and the local average HMM score (LAS). This correlation can already be found for utterances in the training database, i.e. utterances that actually were used for the parameter estimation of the acoustic phonetic models. By introducing confidence measures based on likelihood distance we verified that statistical modeling with respect to speech rate seems most accurate in slow speech segments and deteriorates already at average speaking rates. We further found that the correlation is little, yet observable, for the static features and increases with the frame range of delta(delta) features - reaching up to 0.65. The correlation persists regardless of simple monophone models or context dependent triphones. The LSR-LAS dependency can be used to predict LSR on independent test data directly from the acoustic HMM scores. In addition, LAS can be used as an indicator to assess the performance gain of rate dependent HMM models, which seems small (for fast speech) in comparison to the overall score degradation.
Bibliographic reference. Faltlhauser, Robert / Ruske, Günther / Thomae, M. (2002): "Towards the question: why has speaking rate such an impact on speech recognition performance?", In ICSLP-2002, 2429-2432.