![]() |
Workshop on the Auditory Basis of Speech PerceptionKeele University, UK |
![]() |
This paper compares a number of different auditory power spectral density representations of speech signals in a phoneme recognition task. The numerical properties of the various representations are quite different even though they are calculated from the same intermediate representation. The results presented here clearly indicate that the degree of variability in results is large, even when it is ostensibly the same parameter which is being estimated. Thus, it is not merely 'what' is calculated, but 'how' its value is estimated, which ultimately may determine recognition performance. Two similar but different sets of comparisons have been made to confirm that a significant difference does indeed exist. In both cases, the maximum entropy method of power spectrum estimation is significantly better than the others, even though both this and the maximum likelihood method are based on the same initial linear prediction analysis of the signal. The maximum likelihood method's performance is very nearly the same as that of the Blackman-Tukey method.
Bibliographic reference. Beet, S. W. / Baghai-Ravary, L. (1996): "Towards a better auditory representation for speech recognition", In ABSP-1996, 287-290.