Curated for content, computing, and digital experience professionals

Month: April 2008 (Page 2 of 4)

Multilingual Social Computing: Questioning the Wisdom of the Crowds

The holy grail in translation is the speed versus quality dilemma. That creates controversy. Here’s what we’ve noted after posting our Multilingual Social Networking Alert citing Facebook’s crowdsourcing effort:

No doubt that these references are the tip of an iceberg. How to say “poke” in different languages is clearly not the only conversation going on. And BTW, here’s Facebook’s Translation Application.

Google Executive to Provide Opening Keynote Address on Search Quality at Upcoming Gilbane San Francisco Conference

The Gilbane Group and Lighthouse Seminars announced that Udi Manber, a Google Vice President of Engineering, will kick-off the annual Gilbane San Francisco conference on June 18th at 8:30am with a discussion on Google’s search quality and continued innovation. Now in its fourth year, the conference has rapidly gained a reputation as a forum for bringing together vendor-neutral industry experts that share and debate the latest information technology experiences, research, trends and insights. The conference takes place June 18-20 at the Westin Market Hotel in San Francisco. Gilbane San Francisco helps attendees move beyond the mainstream content technologies they are familiar with, to enhanced “2.0” versions, which can open up new business opportunities, keep customers engaged, and improve internal communication and collaboration. The 2008 event will have its usual collection of information and content technology experts, including practitioners, technologists, business strategists, consultants, and the leading analysts from a variety of market and technology research firms. Topics to be covered in-depth at Gilbane San Francisco include– Web Content Management (WCM); Enterprise Search, Text Analytics, Semantic Technologies; Collaboration, Enterprise Wikis & Blogs; “Enterprise 2.0” Technologies & Social Media; Content Globalization & Localization; XML Content Strategies; Enterprise Content Management (ECM); Enterprise Rights Management (ERM); and Publishing Technology & Best Practices. Details on the Google keynote session as well as other keynotes and conference breakout sessions can be found at

Resources & Opportunity: W3C’s ITS Interest Group

Cross-post from the Globalization blog.
At the end of March, the W3C announced the launch of the Internationalization Tag Set (ITS) Interest Group (IG) as a forum to foster a community of users that promotes the tag set’s adoption and further development. Like Unicode’s CLDR initiative, the emphasis on community interaction and collaboration underscores the ever-increasing, Web-driven impact of cooperative spirit.

As the Web nears its 20th birthday, we would imagine efforts such as ITS IG continue to be music to the ears of its inventor and W3C founder, Tim Berners-Lee. This particular interest group is certainly not the first nor the last of the educational and outreach efforts the W3C has launched since 1994.

It is also not the first nor the last of the activities from W3C’s Internationalization (I18n) Activity, known worldwide as simply I18n. The mission? “To ensure that W3C’s formats and protocols are usable worldwide in all languages and in all writing systems.” The goals? Ensure universal access, support the internationalization and localization of documents, and help reduce the time and cost associated with internationalization and localization projects. Consistent and admirable objectives, described eloquently by Richard Ishida, Activity Lead for the I18n Core Working Group in his article, It’s All About Customer Focus.

I18n accomplishments include a treasure trove of information from specifications and recommendations to educational materials to the newest initiative, hosting the Planet I18n Blog aggregator. Worth checking out; give yourself time to stay a while.

Webinar: Analytics-Driven Web Content

Thursday, May 8, 1:00 pm ET
Customers expect more than a one-size-fits-all web experience. They want “my-size-fits-me” content every time they interact with your company. Or they don’t come back.
In this webinar, marketing managers learn the latest approaches to using knowledge about visitors and behaviors to drive dynamic content delivery. Tony White, Gilbane’s lead analyst for web content management, and Brett Zucker, CTO for Bridgeline Software, discuss emerging technologies for serving up analytics-driven content that attracts customers, engenders loyalty, and improves site ROI. The webinar is sponsored by Bridgeline Software.
Registration is open.

Only Humans can Ensure the Value of Search in Your Enterprise

While considering what is most important in selecting the search tools for any given enterprise application, I took a few minutes off to look at the New York Times. This article, He Wrote 200,000 Books (but Computers Did Some of the Work), by Noam Cohen, gave me an idea about how to compare Internet search with enterprise search.

A staple of librarians’ reference and research arsenal has been a category of reference material called “bibliographies of bibliographies.” These works, specific to a subject domain, are aimed at a usually scholarly audience to bring a vast amount of content into focus for the researcher. Judging from the article, that is what Mr. Parker’s artificial intelligence is doing for the average person who needs general information about a topic. According to at least one reader, the results are hardly scholarly.

This article points out several things about computerized searching:

  • It does a very good job of finding a lot of information easily.
  • Generalized Internet searching retrieves only publicly accessible, free-for-consumption, content.
  • Publicly available content is not universally vetted for accuracy, authoritativeness, trustworthiness, or comprehensiveness, even though it may be all of these things.
  • Vast amounts of accurate, authoritative, trustworthy and comprehensive content does exist in electronic formats that search algorithms used by Mr. Parker or the rest of us on the Internet will never see. That is because it is behind-the-firewall or accessible only through permission (e.g. subscription, need-to-know). None of his published books will serve up that content.

Another concept that librarians and scholars understand is that of primary source material. It is original content, developed (written, recorded) by human beings as a result of thought, new analysis of existing content, bench science, or engineering. It is often judged, vetted, approved or otherwise deemed worthy of the primary source label by peers in the workplace, professional societies or professional publishers of scholarly journals. It is often the substance of what get republished as secondary and tertiary sources (e.g. review articles, bibliographies, books).

We all need secondary and tertiary sources to do our work, learn new things, and understand our work and our world better. However, advances in technology, business operations, and innovation depend on sharing primary source material in thoughtfully constructed domains in our enterprises of business, healthcare, or non-profits. Patient’s laboratory or mechanical device test data that spark creation of primary source content need surrounding context to be properly understood and assessed for value and relevancy.

To be valuable enterprise search needs to deliver context, relevance, opportunities for analysis and evaluation, and retrieval modes that give the best results for any user seeking valid content. There is a lot that computerized enterprise search can do to facilitate this type of research but that is not the whole story. There must still be real people who select the most appropriate search product for that enterprise and that defined business case. They must also decide content to be indexed by the search engine based on its value, what can be secured with proper authentication, how it should be categorized appropriately, and so on. To throw a computer search application at any retrieval need without human oversight is a waste of capital. It will result in disappointment, cynicism and skepticism about the value of automating search because the resulting output will be no better than Mr. Parker’s books.

Free Globalization Intelligence: Unicode’s CLDR Project

I recently had the pleasure of interviewing Arle Lommel, LISA OSCAR Standards Chair, to discuss the importance of Unicode’s Common Locale Data Repository (CLDR) project, which collects and provides data such as date/time formats, numeric formatting, translated language and country names, and time zone information that is needed to support globalization.

LC: What is the CLDR?
AL: The Common Locale Data Repository is a volunteer-developed and maintained resource coordinated and administered by the Unicode Consortium that is available for free. Its goal is to gather basic linguistic information for various “locales,” essentially combinations of a language and a location, like French in Switzerland.
LC: What does the resource encompass?
AL: CLDR gathers things like lists of language and country names, date formats, time zone names, and so forth. This is critical knowledge to know when developing projects for the markets represented by specific locales. By drilling down past the language level to look at the market level, CLDR data is designed to be relevant for a specific area of the world. Think of the difference between U.S. and British English, for example. You would clearly have a problem if British spellings were used in a U.S. project or prices appeared like “£10.54” instead of “$10.54.” Problems like these are very common when product developers don’t think through what the implications of their design decisions will be.
LC: What other issues does CLDR address?
AL: Other problems addressed by CLDR include the numeric form of dates, where something like “04.05.06” could mean “April 5, 2006,” “May 4, 2006,” or even “May 6, 2004,” depending on where you live. Clearly you have to know what people expect.
LC: What is the advantage of using CLDR?
AL: It makes resources available to anyone, at no cost. Without something like the CLDR, one would need to investigate all of market issues, pay to translate things like country names into each language, and so forth. Activities such as this can add significantly to the cost of a project. The CLDR provides them for free and provides the critical advantage of consistency.
LC: Why should content creators care about the CLDR?
AL: At LISA we have heard time and again that not taking international issues into consideration from a project’s earliest phases doubles the cost of a project and makes it take twice as long. While many issues relate to decisions made by programmers, some of the issues do relate to the job of technical authors and other content creators. While it’s unlikely that a technical writer will need to use a CLDR list of language names in Finnish directly, for instance, the content creator might design an online form in which a user fills out what language he or she would like to be contacted in. If there is insufficient room to display the language name because it is longer in Finnish (a common problem when going from English to Finnish), the end user may have difficulty, something that could have been prevented by the content author if he or she had been given the resources to test the design early on. The CLDR makes the information available that allows authors to prevent basic problems that create issues for users around the world.
LC: How can professionals contribute to the CLDR?
AL: Right now the biggest need of the CLDR is for native (or very good) speakers of non-English languages to (1) supply missing data, and (2) verify that existing data points are correct. Because the CLDR is volunteer driven, people of all levels of competence and ability are able to contribute as much or as little as they want. Unicode welcomes this participation. The real need is for people to know about and use the CLDR. In my experience even the savviest of developers often don’t know about the CLDR and what it contains, so they spend time and money on recreating a resource that they could have for free.
LC: How is LISA supporting CLDR?
AL: We are committed to supporting Unicode and the CLDR, so we have launched an initiative where people who sign up with LISA to contribute to the CLDR and who spend ten or more hours working on the project are eligible to receive individual LISA membership for a year as a token of our appreciation for their contribution. So if any readers have the needed language/locale skills to supply data missing from the CLDR or to review existing data, they can contact me to get started.

« Older posts Newer posts »

© 2022 The Gilbane Advisor

Theme by Anders NorenUp ↑