Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Non-Standard Word and Homograph Resolution for Asian Language Text Analysis

Craig Olinsky, Alan W. Black

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

In this paper we present a general model for text analysis of Asian languages (Chinese and Japanese). That is a method for mapping strings of characters to strings of identified trivially pronounceable words. This work is based on the English Non- Standard Word analysis model suitably augmented to deal with both the lack of spaces between words in Japanese and Chinese and addressing the issues of homographs. Results are present for the sub-components of the process.


Full Paper

Bibliographic reference.  Olinsky, Craig / Black, Alan W. (2000): "Non-standard word and homograph resolution for asian language text analysis", In ICSLP-2000, vol.1, 733-736.