The task of our research is to form phone-like models and a phoneme-like set from spoken word samples without using any transcriptions except for the lexical identification of each word in a vocabulary. This framework is derived from two motivations: 1) automatic design of optimal speech recognition units and structures of phone models, and 2) multi-lingual speech recognition based on language-independent intermediate phonetic codes. The procedure consists of two steps: 1) constructing a VQ codebook of sub-phonetic segments from speech samples, and 2) extracting phonological chunks from sequences of the codes. Segment model is represented with "piecewise linear segment lattice" model, which is a lattice structure of segments, each of which is represented as regression coefficients of feature vectors within the segment. Phonological chunks are extracted with a criterion based on Kullback- Leibler divergence between the distribution of individual VQ codes. The recognition rate yields approximately 90% on the 1542 words task with 128 VQ codes.
Cite as: Kojima, H., Tanaka, K. (2000) Extracting phonological chunks based on piecewise linear segment lattices. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 959-962, doi: 10.21437/ICSLP.2000-430
@inproceedings{kojima00_icslp, author={Hiroaki Kojima and Kazuyo Tanaka}, title={{Extracting phonological chunks based on piecewise linear segment lattices}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 959-962}, doi={10.21437/ICSLP.2000-430} }