In this paper we present a general model for text analysis of Asian languages (Chinese and Japanese). That is a method for mapping strings of characters to strings of identified trivially pronounceable words. This work is based on the English Non- Standard Word analysis model suitably augmented to deal with both the lack of spaces between words in Japanese and Chinese and addressing the issues of homographs. Results are present for the sub-components of the process.
Cite as: Olinsky, C., Black, A.W. (2000) Non-standard word and homograph resolution for asian language text analysis. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 733-736, doi: 10.21437/ICSLP.2000-182
@inproceedings{olinsky00_icslp, author={Craig Olinsky and Alan W. Black}, title={{Non-standard word and homograph resolution for asian language text analysis}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 733-736}, doi={10.21437/ICSLP.2000-182} }