ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks

Miguel Ángel del-Agua, Santiago Piqueras, Adrià Giménez, Alberto Sanchis, Jorge Civera, Alfons Juan


Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation.


DOI: 10.21437/Interspeech.2016-1142

Cite as

del-Agua, M.&., Piqueras, S., Giménez, A., Sanchis, A., Civera, J., Juan, A. (2016) ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks. Proc. Interspeech 2016, 3464-3468.

Bibtex
@inproceedings{del-Agua+2016,
author={Miguel Ángel del-Agua and Santiago Piqueras and Adrià Giménez and Alberto Sanchis and Jorge Civera and Alfons Juan},
title={ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1142},
url={http://dx.doi.org/10.21437/Interspeech.2016-1142},
pages={3464--3468}
}