September 22-25, 1997
It is known that incorporating the temporal information of state durations into the HMM can achieve higher recognition performance. However, when a speech signal is contaminated by ambient noises, it is very possible for a state to stay too long or too short in decoding a state sequence even if state durations are adopted in the models. This phenomenon will severely reduce the efficiency of modeling techniques for state durations. To overcome this problem, a proportional alignment decoding (PAD) method combining with state duration statistics is proposed and proved experimentally to be effective when the speech signal is distorted by ambient noises. Instead of using Viterbi decoding algorithm, the PAD method is used for state decoding in the retraining phase of a conventional HMM and produce a new set of state duration statistics. This state duration alignment scheme is more efficient to prevent a state from occupying too long or too short in recognition phase.
Bibliographic reference. Hung, Wei-Wen / Wang, Hsiao-Chuan (1997): "HMM retraining based on state duration alignment for noisy speech recognition", In EUROSPEECH-1997, 1519-1522.