September 22-25, 1997
The context-dependent modeling technique is extended to include non-speech filler segments occurring between speech word units. In addition to the conventional context-dependent word or subword units, the proposed acoustic modeling provides an eficient way of accounting for the effects of the surrounding speech on the inter-word non-speech segments, especially for small vocabulary recognition tasks. It is argued that a robust recognition scheme is obtained by explicitly accounting for context-dependent inter-word filler acoustics in training while ignoring their explicit context dependencies during recognition testing. Results on a connected digit recognition task over the telephone network indicate an improvement in the error rate from 2.5% to 0.9% i.e., about 64% word error-rate reduction, using the improved model set.
Bibliographic reference. Zeljkovic, Ilija / Narayanan, Shrikanth (1997): "Novel filler acoustic models for connected digit recognition", In EUROSPEECH-1997, 283-286.