N-grams have been extensively used with phonemes or words as basic units in speech recognition. Recently, it has been proposed to use n-grams with phrase tree structures as units to increase speech recognition quality. In order to test this idea on Chinese, a treebank of Chinese hotel reservation con- versation utterances is needed. Because no such treebank is yet available, we have to build it. We propose to see the process of building a tree-bank as a sequence of edition and search operations: input or copy a new utterance (edit a text); search for similar existing utterances to get their corresponding structures and adapt them to the new utterance; adapt the structure (edit a tree) ; earch for similar structures to ensure representaion and coding consistency. This way of doing will have a benefic "snow-ball" effect: the bigger the treebank, the faster and the more consistent its extension.
Cite as: Lepage, Y., Auclerc, N., Shirai, S. (2000) A tool to build a treebank for conversational Chinese. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 985-988, doi: 10.21437/ICSLP.2000-700
@inproceedings{lepage00_icslp, author={Yves Lepage and Nicolas Auclerc and Satoshi Shirai}, title={{A tool to build a treebank for conversational Chinese}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 985-988}, doi={10.21437/ICSLP.2000-700} }