The features used by human phone recognition are generated along the auditory pathway by several transformations. In the first stages `modulation features' are generated in lamina of neurons building a 3 dimensional strongly quantized structure, where each point of the structure corresponds to a feature component. One dimension concerns the different critical bands originated by bundles of inner hair cells. The second dimensions correspond to different locations of the acoustic field around the head. The third dimension is the modulation depth for different spectral and temporal modulation frequencies for each critical band and location. This structure is repeated in the auditory cortex, where a transformation to `phone features' occurs. Due to insufficient neurophysiologic knowledge of these features, we conclude indirectly on their nature based on measurements of the accuracy of perceiving phones. We conclude that the phone features are statistic independent across adjacent phones. This implies that no acoustic context of neighbored phones is used to perceive phones. From these findings we speculate, that a phone oriented segment model is implemented in the auditory cortex. This model has the potential to model correctly the statistic dependencies of all phone features constituting an utterance.
Bibliographic reference. Höge, Harald (2015): "On the nature of the features generated in the human auditory pathway for phone recognition", In INTERSPEECH-2015, 1551-1555.