ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A tool to build a treebank for conversational Chinese

Yves Lepage, Nicolas Auclerc, Satoshi Shirai

N-grams have been extensively used with phonemes or words as basic units in speech recognition. Recently, it has been proposed to use n-grams with phrase tree structures as units to increase speech recognition quality. In order to test this idea on Chinese, a treebank of Chinese hotel reservation con- versation utterances is needed. Because no such treebank is yet available, we have to build it. We propose to see the process of building a tree-bank as a sequence of edition and search operations: input or copy a new utterance (edit a text); search for similar existing utterances to get their corresponding structures and adapt them to the new utterance; adapt the structure (edit a tree) ; earch for similar structures to ensure representaion and coding consistency. This way of doing will have a benefic "snow-ball" effect: the bigger the treebank, the faster and the more consistent its extension.


Cite as: Lepage, Y., Auclerc, N., Shirai, S. (2000) A tool to build a treebank for conversational Chinese. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 985-988

@inproceedings{lepage00_icslp,
  author={Yves Lepage and Nicolas Auclerc and Satoshi Shirai},
  title={{A tool to build a treebank for conversational Chinese}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 985-988}
}