5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Acoustic Observation Context Modeling in Segment Based Speech Recognition

Mate Szarvas (1), Shoichi Matsunaga (2)

(1) Technical University of Budapest, Hungary
(2) NTT Human Interface Laboratories, Japan

This paper describes a novel method that models the correlation between acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by polynomial mean trajectory segment models. This method is an extension of conventional segment modeling approaches in that it not only describes the correlation of acoustic observations inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g., triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. In a speaker-independent phoneme classification test, using the proposed method resulted in a 7-9% reduction in error rate as compared to the traditional triphone segmental model system and a 31% reduction as compared to a similar triphone HMM system.

Full Paper

Bibliographic reference.  Szarvas, Mate / Matsunaga, Shoichi (1998): "Acoustic observation context modeling in segment based speech recognition", In ICSLP-1998, paper 1098.