Unicode Extends Chinese, Japanese, and Korean (CJK) Character Database

The Unicode Consortium announced a significant update of the Unicode Standard, Version 4.0.1. No new characters are added to the standard at this time — the total number of characters still stands at 96,382 for the world’s scripts and collections of symbols. However, the information in the Character Database has been refined to improve the quality of text processing in all languages of the world. This version of the Unicode Character Database includes the first major update of the CJK database (Unihan) in two years. The Unihan Database provides character properties, definitions, pronunciations, mappings, and other information for the CJK characters in the standard — the characters used in particular for Chinese, Japanese, and Korean. This update includes thousands of additions and corrections, including major new correlations with traditional Chinese and Japanese dictionary sources. This version significantly improves the ability to interchange languages such as Arabic, Hebrew, Urdu, and Pashto. It also clarifies the implementation of such languages as Bengali and the relationship between base form letters and accent marks. Full technical details regarding the Unicode Standard, Version 4.0.1 are published online. www.unicode.org

Unicode Extends Chinese, Japanese, and Korean (CJK) Character Database

Leave a Reply Cancel reply

Subscribe to the Gilbane Advisor

Choose Language

Topics we cover

Policies

Contact