ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Refining phoneme segmentations using speaker-adaptive context dependent boundary models

Yong Zhao, Lijuan Wang, Min Chu, Frank K. Soong, Zhigang Cao

Consistent phoneme segmentation is essential in building high quality Text-to-Speech (TTS) voice fonts. In this paper we propose to adapt an existing well-trained Context Dependent Boundary Model (CDBM) for refining segment boundaries to a new speaker with limited, manually segmented data. Three adaptation approaches: MLLR, MAP, and a combination of the two, are studied. The combined one, MLLR+MAP, delivers the best boundary refinement performance. In comparison with other boundary segmentation methods, the adapted CDBM yields better results, especially with a limited amount of adaptation data. Given 400 manually segmented boundary tokens in about 20 sentences as a development set, the segmentation precision can reach 90% of human labeled boundaries within a tolerance of 20 ms.


doi: 10.21437/Interspeech.2005-794

Cite as: Zhao, Y., Wang, L., Chu, M., Soong, F.K., Cao, Z. (2005) Refining phoneme segmentations using speaker-adaptive context dependent boundary models. Proc. Interspeech 2005, 2557-2560, doi: 10.21437/Interspeech.2005-794

@inproceedings{zhao05b_interspeech,
  author={Yong Zhao and Lijuan Wang and Min Chu and Frank K. Soong and Zhigang Cao},
  title={{Refining phoneme segmentations using speaker-adaptive context dependent boundary models}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2557--2560},
  doi={10.21437/Interspeech.2005-794}
}