I recently had the pleasure of interviewing Arle Lommel, LISA OSCAR Standards Chair, to discuss the importance of Unicode’s Common Locale Data Repository (CLDR) project, which collects and provides data such as date/time formats, numeric formatting, translated language and country names, and time zone information that is needed to support globalization.
LC: What is the CLDR?
AL: The Common Locale Data Repository is a volunteer-developed and maintained resource coordinated and administered by the Unicode Consortium that is available for free. Its goal is to gather basic linguistic information for various “locales,” essentially combinations of a language and a location, like French in Switzerland.
LC: What does the resource encompass?
AL: CLDR gathers things like lists of language and country names, date formats, time zone names, and so forth. This is critical knowledge to know when developing projects for the markets represented by specific locales. By drilling down past the language level to look at the market level, CLDR data is designed to be relevant for a specific area of the world. Think of the difference between U.S. and British English, for example. You would clearly have a problem if British spellings were used in a U.S. project or prices appeared like “£10.54″ instead of “$10.54.” Problems like these are very common when product developers don’t think through what the implications of their design decisions will be.
LC: What other issues does CLDR address?
AL: Other problems addressed by CLDR include the numeric form of dates, where something like “04.05.06″ could mean “April 5, 2006,” “May 4, 2006,” or even “May 6, 2004,” depending on where you live. Clearly you have to know what people expect.
LC: What is the advantage of using CLDR?
AL: It makes resources available to anyone, at no cost. Without something like the CLDR, one would need to investigate all of market issues, pay to translate things like country names into each language, and so forth. Activities such as this can add significantly to the cost of a project. The CLDR provides them for free and provides the critical advantage of consistency.
LC: Why should content creators care about the CLDR?
AL: At LISA we have heard time and again that not taking international issues into consideration from a project’s earliest phases doubles the cost of a project and makes it take twice as long. While many issues relate to decisions made by programmers, some of the issues do relate to the job of technical authors and other content creators. While it’s unlikely that a technical writer will need to use a CLDR list of language names in Finnish directly, for instance, the content creator might design an online form in which a user fills out what language he or she would like to be contacted in. If there is insufficient room to display the language name because it is longer in Finnish (a common problem when going from English to Finnish), the end user may have difficulty, something that could have been prevented by the content author if he or she had been given the resources to test the design early on. The CLDR makes the information available that allows authors to prevent basic problems that create issues for users around the world.
LC: How can professionals contribute to the CLDR?
AL: Right now the biggest need of the CLDR is for native (or very good) speakers of non-English languages to (1) supply missing data, and (2) verify that existing data points are correct. Because the CLDR is volunteer driven, people of all levels of competence and ability are able to contribute as much or as little as they want. Unicode welcomes this participation. The real need is for people to know about and use the CLDR. In my experience even the savviest of developers often don’t know about the CLDR and what it contains, so they spend time and money on recreating a resource that they could have for free.
LC: How is LISA supporting CLDR?
AL: We are committed to supporting Unicode and the CLDR, so we have launched an initiative where people who sign up with LISA to contribute to the CLDR and who spend ten or more hours working on the project are eligible to receive individual LISA membership for a year as a token of our appreciation for their contribution. So if any readers have the needed language/locale skills to supply data missing from the CLDR or to review existing data, they can contact me to get started.