The performances of the demiphone (a context dependent subword unit that models independently the left and the right parts of a phoneme) and the triphone are compared. Continuous density hidden Markov modeling for both types of units is tested with the HTK software using decision-tree state clustering. The speech material is taken from the SpeechDat Spanish database, composed by continuous speech utterances recorded through the public telephone network. The training corpus is speaker and task independent. Two testing sets are tried: isolated words corresponding to speaker names, city names and phonetically rich words; and numbers of Spanish identification cards and dates. The main conclusion is that the demiphone simplifies the recognition system and yields a better performance than the triphone. This result may be explained by the ability of the demiphone to provide an excellent tradeoff between a detailed coarticulation modeling and a proper parameter estimation.
Cite as: Mariño, J.B., Paches-Leal, P., Nogueiras, A. (1998) The demiphone versus the triphone in a decision-tree state-tying framework. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0250, doi: 10.21437/ICSLP.1998-657
@inproceedings{marino98_icslp, author={José B. Mariño and Pau Paches-Leal and Albino Nogueiras}, title={{The demiphone versus the triphone in a decision-tree state-tying framework}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0250}, doi={10.21437/ICSLP.1998-657} }