Large Vocabulary Concatenative Resynthesis

Soumi Maiti, Joey Ching, Michael Mandel


Traditional speech enhancement systems reduce noise by modifying the noisy signal, which suffer from two problems: under-suppression of noise and over-suppression of speech. As an alternative, in this paper, we use the recently introduced concatenative resynthesis approach where we replace the noisy speech with its clean resynthesis. The output of such a system can produce speech that is both noise-free and high quality. This paper generalizes our previous small-vocabulary system to large vocabulary. To do so, we employ efficient decoding techniques using fast approximate nearest neighbor (ANN) algorithms. Firstly, we apply ANN techniques on the original small vocabulary task and get 5X speedup. We then apply the techniques to the construction of a large vocabulary concatenative resynthesis system and scale the system up to 12X larger dictionary. We perform listening tests with five participants to measure subjective quality and intelligibility of the output speech.


 DOI: 10.21437/Interspeech.2018-2383

Cite as: Maiti, S., Ching, J., Mandel, M. (2018) Large Vocabulary Concatenative Resynthesis. Proc. Interspeech 2018, 1190-1194, DOI: 10.21437/Interspeech.2018-2383.


@inproceedings{Maiti2018,
  author={Soumi Maiti and Joey Ching and Michael Mandel},
  title={Large Vocabulary Concatenative Resynthesis},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1190--1194},
  doi={10.21437/Interspeech.2018-2383},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2383}
}