ISCA Archive SPSC 2021
ISCA Archive SPSC 2021

Visualizing Automatic Speech Recognition – Means for a Better Understanding?

Karla Markert, Romain Parracone, Mykhailo Kulakov, Philip Sperl, Ching-Yu Kao, Konstantin Böttinger

Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the complex structure of the deep neural networks (DNNs) they are based on. In this paper, we show how so-called attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR. Taking DeepSpeech, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output. We focus on three visualization techniques: Layer-wise Relevance Propagation (LRP), Saliency Maps, and Shapley Additive Explanations (SHAP). We compare these methods and discuss potential further applications, such as in the detection of adversarial examples.


doi: 10.21437/SPSC.2021-4

Cite as: Markert, K., Parracone, R., Kulakov, M., Sperl, P., Kao, C.-Y., Böttinger, K. (2021) Visualizing Automatic Speech Recognition – Means for a Better Understanding? Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 14-20, doi: 10.21437/SPSC.2021-4

@inproceedings{markert21_spsc,
  author={Karla Markert and Romain Parracone and Mykhailo Kulakov and Philip Sperl and Ching-Yu Kao and Konstantin Böttinger},
  title={{Visualizing Automatic Speech Recognition – Means for a Better Understanding?}},
  year=2021,
  booktitle={Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication},
  pages={14--20},
  doi={10.21437/SPSC.2021-4}
}