Interspeech'2005 - Eurospeech
To provide an appropriate model for perception of temporal structures of speech, we applied a comprehensive computational model of the human auditory peripherals to detect changes in speech signals that potentially indicate arrivals of new events. In each tonotopic sub-band, an increase in the activation level was taken into account for the plausibility of a new event, while a decrease was ignored. The total contour obtained by integrating the sub-band information exhibited sharp peaks and dips compared to the loudness contour. A quantitative evaluation to estimate the speaking rate of natural speech also demonstrated that the event-plausibility model performs better than the loudness model.
Bibliographic reference. Tsuzaki, Minoru / Tanaka, Satomi / Kato, Hiroaki / Sagisaka, Yoshinori (2005): "Application of auditory image model for speech event detection", In INTERSPEECH-2005, 677-680.