We describe a new approach for speaker independent automatic phoneme alignment. Typical algorithms for this task use only phoneme-to-frame similarity measures which are somehow maximised or minimised. In addition to such similarity measures, we use phoneme duration hypotheses generated by the speech synthesis system HADIFIX. For algorithms based on dynamic programming, it is difficult to use these duration hypotheses, so we create a cost-function consisting of phoneme-to-frame and segment-to-duration hypotheses similarity measures and minimise this cost-function by a Genetic Algorithm. The results show that the accuracy of automatically determined phoneme boundaries increases. This accounts especially for speakers not used in the training phase.
Cite as: Stöber, K., Hess, W. (1998) Additional use of phoneme duration hypotheses in automatic speech segmentation. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0239, doi: 10.21437/ICSLP.1998-601
@inproceedings{stober98_icslp, author={Karlheinz Stöber and Wolfgang Hess}, title={{Additional use of phoneme duration hypotheses in automatic speech segmentation}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0239}, doi={10.21437/ICSLP.1998-601} }