ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Additional use of phoneme duration hypotheses in automatic speech segmentation

Karlheinz Stöber, Wolfgang Hess

We describe a new approach for speaker independent automatic phoneme alignment. Typical algorithms for this task use only phoneme-to-frame similarity measures which are somehow maximised or minimised. In addition to such similarity measures, we use phoneme duration hypotheses generated by the speech synthesis system HADIFIX. For algorithms based on dynamic programming, it is difficult to use these duration hypotheses, so we create a cost-function consisting of phoneme-to-frame and segment-to-duration hypotheses similarity measures and minimise this cost-function by a Genetic Algorithm. The results show that the accuracy of automatically determined phoneme boundaries increases. This accounts especially for speakers not used in the training phase.


doi: 10.21437/ICSLP.1998-601

Cite as: Stöber, K., Hess, W. (1998) Additional use of phoneme duration hypotheses in automatic speech segmentation. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0239, doi: 10.21437/ICSLP.1998-601

@inproceedings{stober98_icslp,
  author={Karlheinz Stöber and Wolfgang Hess},
  title={{Additional use of phoneme duration hypotheses in automatic speech segmentation}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0239},
  doi={10.21437/ICSLP.1998-601}
}