An Auditory Model of Speaker Size Perception for Voiced Speech Sounds

Toshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson


An auditory model was developed to explain the results of behavioral experiments on perception of speaker size with voiced speech sounds. It is based on the dynamic, compressive gammachirp (dcGC) filterbank and a weighting function (SSI weight) derived from a theory of size-shape segregation in the auditory system. Voiced words with and without high-frequency emphasis (+6 dB/octave) were produced using a speech vocoder (STRAIGHT). The SSI weighting function reduces the effect of glottal pulse excitation in voiced speech, which, in turn, makes it possible for the model to explain the individual subject variability in the data.


 DOI: 10.21437/Interspeech.2017-196

Cite as: Irino, T., Takimoto, E., Matsui, T., Patterson, R.D. (2017) An Auditory Model of Speaker Size Perception for Voiced Speech Sounds. Proc. Interspeech 2017, 1153-1157, DOI: 10.21437/Interspeech.2017-196.


@inproceedings{Irino2017,
  author={Toshio Irino and Eri Takimoto and Toshie Matsui and Roy D. Patterson},
  title={An Auditory Model of Speaker Size Perception for Voiced Speech Sounds},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1153--1157},
  doi={10.21437/Interspeech.2017-196},
  url={http://dx.doi.org/10.21437/Interspeech.2017-196}
}