Speaker anonymization by pitch shifting based on time-scale modification

Candy Olivia Mawalim, Shogo Okada, Masashi Unoki

The increasing usage of speech in digital technology raises a privacy issue because speech contains biometric information. Several methods of dealing with this issue have been proposed, including speaker anonymization or de-identification. Speaker anonymization aims to suppress personally identifiable information (PII) while keeping the other speech properties, including linguistic information. In this study, we utilize time-scale modification (TSM) speech signal processing for speaker anonymization. Speech signal processing approaches are significantly less complex than the state-of-the-art x-vector-based speaker anonymization method because it does not require a training process. We propose anonymization methods using two major categories of TSM, synchronous overlap-add (SOLA)-based algorithm and phase vocoder-based TSM (PV-TSM). For evaluating our proposed methods, we utilize the standard objective evaluation introduced in the VoicePrivacy challenge. The results show that our method based on the PV-TSM balances privacy and utility metrics better than baseline systems, especially when evaluating with an automatic speaker verification (ASV) system in anonymized enrollment and anonymized trials (a-a). Further, our method outperformed the x-vector-based speaker method, which has limitations in its complex training process, low privacy in an a-a scenario, and low voice distinctiveness.

