The goal of this work is to model phone-like units automatically from spoken word samples without using any transcriptions except for the lexical identification of the words. In order to implement this task, we have proposed the "piecewise linear segment lattice (PLSL)" model for phoneme representation. The structure of this model is a lattice of segments, each of which is represented as regression coefficients of feature vectors within the segment. In order to organize phone models, operations including division, concatenation, blocking and clustering are applied to the models. This paper mainly report on blocking and clustering. Experimental results for isolated word recognition task is that the recognition rate is significantly improved by blocking the segments and by clustering the segments within a block. We get sufficient performance for the task with the models consist of at most 128 clusters of segment patterns.
Cite as: Kojima, H., Tanaka, K. (1998) Generalized phone modeling based on piecewise linear segment lattice. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0995, doi: 10.21437/ICSLP.1998-181
@inproceedings{kojima98_icslp, author={Hiroaki Kojima and Kazuyo Tanaka}, title={{Generalized phone modeling based on piecewise linear segment lattice}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0995}, doi={10.21437/ICSLP.1998-181} }