A Duration Modeling Technique with Incremental Speech Rate Normalization

Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori

Toshiba Corporation, Japan

This paper describes a novel technique to exploit duration information for low resource speech recognition systems. Using explicit duration models significantly increases computational cost due to a large search space. To avoid this problem, most of techniques using duration information adopt two-pass and N-best re-scoring approaches. Meanwhile, we propose an algorithm using word duration models with incremental speech rate normalization for the one-pass decoding approach. In the proposed technique, penalties are only added to scores of words with outlier durations, and not all words need to have duration models. Experimental results show that the proposed technique reduces up to 17% of errors on in-car digit string tasks without significant increase in computational cost.

