ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Segmentation using a maximum entropy approach

Kishore Papineni, Satya Dharanipragada

Consider generating phonetic baseforms from orthographic spellings. Availability of a segmentation (grouping) of the characters can be exploited to achieve better phonetic translation. We are interested in building segmentation models without using explicit segmentation or alignment information during training. The heart of our segmentation algorithm is a conditional probabilistic model that predicts whether there are less, equal, or more phones than characters in the word. We use just this contraction-expansion information on whole words for training the model. The model has three components: a prior model, a set of features, and weights of the features. The features are selected and weights assigned in maximum entropy framework. Even though the model is trained on whole words, we effectively localize it on substrings to induce segmentation of the word to be segmented. Segmentation is also aided by considering substrings in both forward and backward directions.

doi: 10.21437/ICSLP.1998-662

Cite as: Papineni, K., Dharanipragada, S. (1998) Segmentation using a maximum entropy approach. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0559, doi: 10.21437/ICSLP.1998-662

  author={Kishore Papineni and Satya Dharanipragada},
  title={{Segmentation using a maximum entropy approach}},
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0559},