INTERSPEECH 2004 - ICSLP
Application specific acoustic models provide the best recognition accuracy, but they are expensive to train, because they require the transcription of large amount of in-domain speech. This paper focuses on the acoustic model estimation given limited in-domain transcribed speech data, and large amounts of transcribed out-of-domain data. First, we evaluate several combinations of known methods to optimize the adaptation/training of acoustic models on the limited in-domain speech data. Then, we propose Gaussian sharing to combine in-domain models with out-of-domain models, and a data generation process to simulate the presence of more speakers in the in-domain data. In a spoken language dialog application, we contrast our methods against an upper accuracy bound of 69.1% (model trained on many in-domain data) and a lower bound of 60.8% (no in-domain data). Using only 2 hours of in-domain speech for model estimation, we improve the accuracy by 5.1% (to 65.9%) over the lower bound.
Bibliographic reference. Bocchieri, Enrico / Riley, Michael / Saraclar, Murat (2004): "Methods for task adaptation of acoustic models with limited transcribed in-domain data", In INTERSPEECH-2004, 2953-2956.