7th International Conference on Spoken Language Processing
September 16-20, 2002
Reducing mismatch between HMMs trained with clean speech and speech signals corrupted with background noise can be approached by speech distribution adaptation using parallel model combination (PMC). Accurate PMC has no closed-form expression, therefore simplification assumptions must be made in implementation. Under three assumptions, i.e. log-normal, log-add and log-max, adaptation formula for log-spectral parameters are presented, both for static and dynamic parameters.
Experimental evaluation uses TI-DIGITS speech database corrupted with car noise at 0dB signal-to-noise ratio. The recognition performance of the above three types of simplification is established. It is shown that, the adaptation of both static and dynamic parameters gives as much as 30% lower WER compared to adapting only static parameters.
The findings and results presented in the paper provide a basis for trading-offs between recognition accuracy and computation requirement.
Bibliographic reference. Gong, Yifan (2002): "A comparative study of approximations for parallel model combination of static and dynamic parameters", In ICSLP-2002, 1029-1032.