ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN

Fabrice Malfrère, Olivier Deroo, Thierry Dutoit

In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.


doi: 10.21437/ICSLP.1998-595

Cite as: Malfrère, F., Deroo, O., Dutoit, T. (1998) Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0354, doi: 10.21437/ICSLP.1998-595

@inproceedings{malfrere98b_icslp,
  author={Fabrice Malfrère and Olivier Deroo and Thierry Dutoit},
  title={{Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0354},
  doi={10.21437/ICSLP.1998-595}
}