End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform

Hyeonseung Lee, Hyung Yong Kim, Woo Hyun Kang, Jeunghun Kim, Nam Soo Kim


This paper describes a novel waveform-level end-to-end model for multi-channel speech enhancement. The model first extracts sample-level speech embedding using channel-wise convolutional neural network (CNN) and compensates time-delays between the channels based on the embedding, resulting in time-aligned multi-channel signals. Then the signals are given as input of multi-channel enhancement extension of WaveUNet which directly outputs estimated clean speech waveform. The whole model is trained to minimize modified mean squared error (MSE), signal-to-distortion ratio (SDR) cost, and senone cross-entropy of back-end acoustic model at the same time. Evaluated on the CHiME-4 simulated set, the proposed system outperformed state-of-the-art generalized eigenvalue (GEV) beamformer in terms of perceptual evaluation of speech quality (PESQ) and SDR, and showed competitive results in short time objective intelligibility (STOI). Word-error-rates (WERs) of the system’s output on simulated sets were comparable to that of bidirectional long short-term memory (BLSTM) GEV beamformer. However, the system showed relatively high WERs on real sets, achieving relative error rate reduction (RERR) of 14.3% over noisy signal on real evaluation set.


 DOI: 10.21437/Interspeech.2019-2397

Cite as: Lee, H., Kim, H.Y., Kang, W.H., Kim, J., Kim, N.S. (2019) End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform. Proc. Interspeech 2019, 4285-4289, DOI: 10.21437/Interspeech.2019-2397.


@inproceedings{Lee2019,
  author={Hyeonseung Lee and Hyung Yong Kim and Woo Hyun Kang and Jeunghun Kim and Nam Soo Kim},
  title={{End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4285--4289},
  doi={10.21437/Interspeech.2019-2397},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2397}
}