ISCA Archive SPSC 2022
ISCA Archive SPSC 2022

Cascade of phonetic speech recognition, speaker embeddings gan and multispeaker speech synthesis for the VoicePrivacy 2022 Challenge

Sarina Meyer, Pascal Tilli, Florian Lux, Pavel Denisov, Julia Koch, Ngoc Thang Vu

Speaker anonymization is the task of modifying speech recordings to hide the identity of the original speaker by changing the voice in the audio. Simultaneously, the anonymized audio should remain usable for downstream tasks and thus keep other information of the original audio like the linguistic content. This typically creates a privacy-utility trade-off of anonymization techniques. In our submission to the VoicePrivacy 2022 Challenge, we aim to reduce this trade-off by creating a speech-to-speech pipeline that (a) eliminates all clues about speaker identity by reducing the audio to phonetic transcriptions, (b) generates a new, non-existent voice using a Generative Adversarial Network, leading to artificial yet natural-like and distinctive speakers, and (c) synthesizes an anonymous version of the original utterance based on the transcriptions, anonymous speaker embedding, and estimated pitch. According to the objective evaluation, this anonymization method leads to almost perfect privacy and voice distinctiveness, and clearly outperforms all baseline systems for these two metrics. For the speech recognition utility metric, we achieve similar good results on LibriSpeech and much better ones on VCTK as compared to the baselines and the original non-anonymized data. Solely for pitch correlation, we only just meet the required threshold because our system does not use the original pitch trajectory for synthesis. Overall, our approach successfully hides the speaker identity while keeping the linguistic content, proving to be generally more effective than any of the baselines of the VoicePrivacy 2022 Challenge.


Cite as: Meyer, S., Tilli, P., Lux, F., Denisov, P., Koch, J., Vu, N.T. (2022) Cascade of phonetic speech recognition, speaker embeddings gan and multispeaker speech synthesis for the VoicePrivacy 2022 Challenge . Proc. 2nd Symposium on Security and Privacy in Speech Communication,

@inproceedings{meyer22_spsc,
  author={Sarina Meyer and Pascal Tilli and Florian Lux and Pavel Denisov and Julia Koch and Ngoc Thang Vu},
  title={{Cascade of phonetic speech recognition, speaker embeddings gan and multispeaker speech synthesis for the VoicePrivacy 2022 Challenge  }},
  year=2022,
  booktitle={Proc. 2nd Symposium on Security and Privacy in Speech Communication},
  pages={}
}