This paper describes a novel method that models the correlation between acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by polynomial mean trajectory segment models. This method is an extension of conventional segment modeling approaches in that it not only describes the correlation of acoustic observations inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g., triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. In a speaker-independent phoneme classification test, using the proposed method resulted in a 7-9% reduction in error rate as compared to the traditional triphone segmental model system and a 31% reduction as compared to a similar triphone HMM system.
Cite as: Szarvas, M., Matsunaga, S. (1998) Acoustic observation context modeling in segment based speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1098, doi: 10.21437/ICSLP.1998-187
@inproceedings{szarvas98_icslp, author={Mate Szarvas and Shoichi Matsunaga}, title={{Acoustic observation context modeling in segment based speech recognition}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 1098}, doi={10.21437/ICSLP.1998-187} }