Factorial Modeling for Effective Suppression of Directional Noise

Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie


The assumed scenario is transcription of a face-to-face conversation, such as in the financial industry when an agent and a customer talk over a desk with microphones placed between the speakers and then it is transcribed. From the automatic speech recognition (ASR) perspective, one of the speakers is the target speaker, and the other speaker is a directional noise source. When the number of microphones is small, we often accept microphone intervals that are larger than the spatial aliasing limit because the performance of the beamformer is better. Unfortunately, such a configuration results in significant leakage of directional noise in certain frequency bands because the spatial aliasing makes the beamformer and post-filter inaccurate there. Thus, we introduce a factorial model to compensate only the degraded bands with information from the reliable bands in a probabilistic framework integrating our proposed metrics and speech model. In our experiments, the proposed method reduced the errors from 29.8% to 24.9%.


 DOI: 10.21437/Interspeech.2017-852

Cite as: Ichikawa, O., Fukuda, T., Kurata, G., Rennie, S.J. (2017) Factorial Modeling for Effective Suppression of Directional Noise. Proc. Interspeech 2017, 389-393, DOI: 10.21437/Interspeech.2017-852.


@inproceedings{Ichikawa2017,
  author={Osamu Ichikawa and Takashi Fukuda and Gakuto Kurata and Steven J. Rennie},
  title={Factorial Modeling for Effective Suppression of Directional Noise},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={389--393},
  doi={10.21437/Interspeech.2017-852},
  url={http://dx.doi.org/10.21437/Interspeech.2017-852}
}