ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need

Yan Huang, Guoli Ye, Jinyu Li, Yifan Gong

Conformer transducer achieves new state-of-the-art end-to-end (E2E) system performance and has become increasingly appealing for production. In this paper, we study how to effectively perform rapid speaker adaptation in a conformer transducer and how it compares with the RNN transducer. We hierarchically decompose the conformer transducer and compare adapting each component through fine-tuning. Among various interesting observations, there are three distinct findings: First, adapting the self-attention can achieve more than 80% gain of the full network adaptation. When the adaptation data is extremely scarce, attention is all you need to adapt. Second, within the self-attention, adapting the value projection outperforms adapting the key or the query projection. Lastly, bias adaptation, despite of its compact parameter space, is surprisingly effective. We conduct experiments on a state-of-the-art conformer transducer for an email dictation task. With 3 to 5 min source speech and 200 minute personalized TTS speech, the best performing encoder and joint network adaptation yields 38.37% and 19.90% relative word error rate (WER) reduction. Combining the attention and bias adaptation can achieve 90% of the gain with significantly smaller footprint. Further comparison with the RNN-T suggests the new state-of-the-art conformer transducer can benefit as much as if not more from personalization.


doi: 10.21437/Interspeech.2021-1884

Cite as: Huang, Y., Ye, G., Li, J., Gong, Y. (2021) Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need. Proc. Interspeech 2021, 1309-1313, doi: 10.21437/Interspeech.2021-1884

@inproceedings{huang21c_interspeech,
  author={Yan Huang and Guoli Ye and Jinyu Li and Yifan Gong},
  title={{Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1309--1313},
  doi={10.21437/Interspeech.2021-1884}
}