Basis Technology announced the release of its Chinese Morphological Analyzer (CMA), an accurate segmentation engine for search and retrieval of Chinese text. Chinese Morphological Analyzer combines dictionary-retrieval technology with syntactic analysis to yield accurate text segmentation results for both the simplified and traditional written forms of the language. CMA boasts a comprehensive online Chinese dictionary of nearly 1,000,000 headwords, incorporating phonological, lexical and grammatical attributes. This information is used by CMA algorithms to segment and accurately disambiguate Chinese text using contextual clues combined with syntactic analysis. Morphological analysis of the Chinese language is an inherently complex problem. Chinese is written without explicit word boundaries, and it lacks the script transition cues of Japanese and Korean scripts. Furthermore, each character can potentially function as an isolated word (free morpheme) or part of a compound word (bound morpheme) or phrase. For these reasons, maintaining online forms and databases, and facilitating effective search and retrieval, are particularly difficult in the Chinese language. Chinese Morphological Analyzer is written in portable ANSI C++ and has a pure Unicode (UCS2) internal architecture that handles every major Chinese encoding format. Chinese Morphological Analyzer is licensed as a royalty-free software development kit (SDK) or as a source code distribution on all major platforms, including Win32, Solaris, HP-UX, Digital UNIX and Linux. www.basistech.com
Leave a Reply
You must be logged in to post a comment.