5th International Conference on Spoken Language Processing
This paper presents a method of constructing a statistical phonemic segment model (SPSM) for a speech recognition system based on speaker-independent context-independent automatic phonemic segmentation. In our recent research, we proposed the phoneme recognition system using the template matching method with the same segmentation, and confirmed that 5-frame-fixed time sequence of feature vectors used as a template represents features of phoneme effectively. This time, to improve a mass of these templates to a smarter model, we introduced a statistical method into modeling. The structure of SPSM connects 5 distributions of Gaussian N-mixture density in series. By the experiment of closed Japanese spoken word recognition, using VCV balanced 4920 words spoken by 10 male adults including 34430 phonemes in total, the rate of phoneme recognition using SPSM was up to 90.23 % compared with the rate using phoneme templates, 80.39 %.
Bibliographic reference. Aizawa, Katsura / Furuichi, Chieko (1998): "A statistical phonemic segment model for speech recognition based on automatic phonemic segmentation", In ICSLP-1998, paper 0544.