ISCA Archive SSW 2023
ISCA Archive SSW 2023

Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time Warping

Ravi Shankar, Archana Venkataraman

We propose a new method to adaptively modify the rhythm of agiven speech signal. We train a masked convolutional encoder-decoder network to generate this attention map via a stochasticversion of the mean absolute error loss function. Our modelalso predicts the length of the target speech signal using the encoder embeddings, which determines the number of time stepsfor the decoding operation. During testing, we use the learnedattention map as a proxy for the frame-wise similarity matrixbetween the given input speech and an unknown target speechsignal. In an open-loop fashion, we compute a warping pathfor rhythm modification. Our experiments demonstrate that thisadaptive framework achieves similar performance as the fullysupervised dynamic time warping algorithm on both voice conversion and emotion conversion tasks. We also show that themodified speech utterances achieve high user quality ratings,thus highlighting the practical utility of our method.


doi: 10.21437/SSW.2023-28

Cite as: Shankar, R., Venkataraman, A. (2023) Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time Warping. Proc. 12th ISCA Speech Synthesis Workshop (SSW2023), 177-183, doi: 10.21437/SSW.2023-28

@inproceedings{shankar23_ssw,
  author={Ravi Shankar and Archana Venkataraman},
  title={{Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time Warping}},
  year=2023,
  booktitle={Proc. 12th ISCA Speech Synthesis Workshop (SSW2023)},
  pages={177--183},
  doi={10.21437/SSW.2023-28}
}