This work focuses on the issues and the challenges in acoustic adaptation in context of on-line children's speech recognition. When children's speech is decoded on adults' speech trained acoustic models, severely degraded recognition performance is noted on account of extreme acoustic mismatch. Though a number of conventional adaptation techniques are available, they are found to be undesirably latent for an on-line task. For addressing the same, in this work we have combined two low complexity fast adaptation techniques, namely acoustic model interpolation and low-rank feature projection. Two schemes for doing the same are presented in this work. In the first approach, model interpolation is done using weights estimated in unconstrained fashion. The other approach is a hybrid one in which a set mean supervectors are pre-estimated using suitable developmental data. Those are then optimally scaled using the given test data. Though the unconstrained approach results in better improvements over baseline, it has a higher complexity and memory requirements. In case of the hybrid approach, for interpolating M models, the number of parameters to be estimated and memory requirements are reduced by a factor of (M - 1).
Bibliographic reference. Shahnawazuddin, S. / Sinha, Rohit (2015): "Low-memory fast on-line adaptation for acoustically mismatched children's speech recognition", In INTERSPEECH-2015, 1630-1634.