Listener Preference on the Local Criterion for Ideal Binary-Masked Speech

Zhuohuang Zhang, Yi Shen

Ideal binary mask (IBM) is a signal-processing technique that retains the time-frequency regions in a mixture of target speech and background noise when the local signal-to-noise ratio (SNR) is higher than a local criterion (LC) and removes the regions otherwise. The intelligibility of IBM-processed speech is typically high and does not depend on the choice of LC for a wide range of LC values. The current study investigates the listeners’ preferences on the LC value for IBM processed speech. Concatenated everyday sentences were mixed with three types of background noises (airplane noise, train noise, and multi-talker babble) and were presented continuously to the listeners following the IBM processing. The IBM algorithm was implemented so that the listeners were able to adjust the LC value in real-time using a programmable knob. The listeners were instructed to adjust the LC value until the IBM-processed stimuli reached the most preferable quality. Across 20 listeners, large individual differences were observed for the preferred LC values. A cluster analysis identified that 11 of the 20 listeners exhibited consistent patterns of results. For this main cluster of listeners, the preferred LC value depended on the noise type, overall SNR, and the difficulty of the target sentences.

 DOI: 10.21437/Interspeech.2019-1369

Cite as: Zhang, Z., Shen, Y. (2019) Listener Preference on the Local Criterion for Ideal Binary-Masked Speech. Proc. Interspeech 2019, 1383-1387, DOI: 10.21437/Interspeech.2019-1369.

  author={Zhuohuang Zhang and Yi Shen},
  title={{Listener Preference on the Local Criterion for Ideal Binary-Masked Speech}},
  booktitle={Proc. Interspeech 2019},