ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Concatenative Resynthesis Using Twin Networks

Soumi Maiti, Michael I. Mandel

Traditional noise reduction systems modify a noisy signal to make it more like the original clean signal. For speech, these methods suffer from two main problems: under-suppression of noise and over-suppression of target speech. Instead, synthesizing clean speech based on the noisy signal could produce outputs that are both noise-free and high quality. Our previous work introduced such a system using concatenative synthesis, but it required processing the clean speech at run time, which was slow and not scalable. In order to make such a system scalable, we propose here learning a similarity metric using two separate networks, one network processing the clean segments offline and another processing the noisy segments at run time. This system incorporates a ranking loss to optimize for the retrieval of appropriate clean speech segments. This model is compared against our original on the CHiME2-GRID corpus, measuring ranking performance and subjective listening tests of resyntheses.

doi: 10.21437/Interspeech.2017-1653

Cite as: Maiti, S., Mandel, M.I. (2017) Concatenative Resynthesis Using Twin Networks. Proc. Interspeech 2017, 3647-3651, doi: 10.21437/Interspeech.2017-1653

  author={Soumi Maiti and Michael I. Mandel},
  title={{Concatenative Resynthesis Using Twin Networks}},
  booktitle={Proc. Interspeech 2017},