Detecting manipulated and synthetic audio

Zhizheng Wu

In recent years, we have witnessed the astonishing advancement of speech generation technology, thanks to the rapid development of deep learning. The state-of-the-art speech synthesis technology can clone a speaker’s voice with a few training samples and generate natural-sounding audio samples that the speaker never said. The technology can be misused to create misinformation, which spreads farther, faster, and more broadly than the truth and erodes our trust in online information. It can also be misused to attack voice biometric systems. This talk will first present a high-level overview of approaches to manipulate and synthesize audio. Then, it will highlight recent technical developments to detect manipulated and synthetic audio. This talk will also discuss some current challenges and the needs from a user point of view.

