We investigated speech intelligibility of four-mora word sounds degraded with a system based on a high quality vocoder, STRAIGHT, and warped-DCT. This system enables us to independently manipulate essential speech parameters for vocal tract filtering and glottal excitation. We report perceptual effects of: 1) temporal smearing' or reduced temporal modulation; 2) time-frequency smearing' or reduced resolution in both temporal modulation and spectral peak; and 3) source smearing' or reduced resolution of glottal pulses. By analyzing intelligibility scores from the various experiments, we quantitatively confirmed that there are linguistic dependencies of phonemes and morae within words.
Cite as: Irino, T., Satou, S., Nomura, S., Banno, H., Kawahara, H. (2005) Speech intelligibility derived from time-frequency and source smearing. Proc. Interspeech 2005, 1737-1740, doi: 10.21437/Interspeech.2005-287
@inproceedings{irino05_interspeech, author={Toshio Irino and Satoru Satou and Shunsuke Nomura and Hideki Banno and Hideki Kawahara}, title={{Speech intelligibility derived from time-frequency and source smearing}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1737--1740}, doi={10.21437/Interspeech.2005-287} }