To provide an appropriate model for perception of temporal structures of speech, we applied a comprehensive computational model of the human auditory peripherals to detect changes in speech signals that potentially indicate arrivals of new events. In each tonotopic sub-band, an increase in the activation level was taken into account for the plausibility of a new event, while a decrease was ignored. The total contour obtained by integrating the sub-band information exhibited sharp peaks and dips compared to the loudness contour. A quantitative evaluation to estimate the speaking rate of natural speech also demonstrated that the event-plausibility model performs better than the loudness model.
Cite as: Tsuzaki, M., Tanaka, S., Kato, H., Sagisaka, Y. (2005) Application of auditory image model for speech event detection. Proc. Interspeech 2005, 677-680, doi: 10.21437/Interspeech.2005-195
@inproceedings{tsuzaki05_interspeech, author={Minoru Tsuzaki and Satomi Tanaka and Hiroaki Kato and Yoshinori Sagisaka}, title={{Application of auditory image model for speech event detection}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={677--680}, doi={10.21437/Interspeech.2005-195} }