In this study we describe a model of speech perception in which neither speaking rate nor lower level temporal cues are considered explicitly. Instead, newly encountered speech signals are encoded as sequences of detailed acoustic events specified in real time at salient landmarks and compared directly with previously heard patterns. When presented with obstruent-vowel sequences occurring in the TIMIT database, the model performs similarly to humans in relying on temporal information for consonant and vowel recognition - and interpreting this information in a rate-dependent manner - when non-temporal cues are ambiguous; and by being adversely affected by local rate variability. These results indicate that compensation for speaking rate in human perception may follow implicitly from even modest knowledge of the robust correlations between temporal and other properties of individual speech events and those of their surrounding contexts, and do not require special normalization processes.
Bibliographic reference. Wade, Travis / Möbius, Bernd (2007): "Speaking rate effects in a landmark-based phonetic exemplar model", In INTERSPEECH-2007, 402-405.