EPiServer announced the introduction of multiple new features for its content management system, EPiServer CMS 5 R2, including solutions for mobility and the iPhone. EPiServer has worked with two partners, Mobiletech A/S and Mobizoft AB, to provide a mobile experience to the visitors of their site, including mobile rendering, video conversion and payments. iPhone support is available as open source templates enabling the system to be viewed from an iphone. Images can now be prepared directly in EPiServer CMS so that web editors no longer need to work on them in another application before moving onto the web page. New dynamic content features enable external data which appears in many places on the website, such as financial or legal text, to be updated throughout the site. Page Type Converter makes it easier to merge pages of different types, and change other page types. Five standard reports are now available— Non-published pages, published pages, modified pages, expiring/expired pages and an overview of simple addresses. External data such as an archive of articles at a media company can be integrated and displayed in a website using EPiServer CMS. The data will be appear as a native EPiServer CMS page. This enables structured data stored on another document management system to be converted to a webpage in EPiServer and viewed. EPiServer CMS now supports Oracle, Windows Server 2003 and 2008, as well as XP and Vista, Visual Studio 2008 and 2000 Express, and ASP Net 3.5 SP1 or later. http://www.EPiServer.com/
Page 286 of 932
I am surprised how often various content organizing mechanisms on the Web are compared to the Dewey Decimal System. As a former librarian, I am disheartened to be reminded how often students were lectured on the Dewey Decimal system, apparently to the exclusion of learning about subject categorization schemes. They complemented each other but that seems to be a secret among all but librarians.
I’ll try to share a clearer view of the model and explain why new systems of organizing content in enterprise search are quite different than the decimal model.
Classification is a good generic term for defining physical organizing systems. Unique animals and plants are distinguished by a single classification in the biological naming system. So too are books in a library. There are two principal classification systems for arranging books on the shelf in Western libraries: Dewey Decimal and Library of Congress (LC). They each use coding (numeric for Dewey decimal and alpha-numeric for Library of Congress) to establish where a book belongs logically on a shelf, relative to other books in the collection, according to the book’s most prominent content topic. A book on nutrition for better health might be given a classification number for some aspect of nutrition or one for a health topic, but a human being has to make a judgment which topic the book is most “about” because the book can only live in one section of the collection. It is probably worth mentioning that the Dewey and LC systems are both hierarchical but with different priorities. (e.g. Dewey puts broad topics like Religion and Philosophy and Psychology at top levels and LC puts those two topics together while including more scientific and technical topics at the top of the list, like Agriculture and Military Science.)
So why classify books to reside in topic order? It requires a lot of labor to move the collections around to make space for new books. It is for the benefit of the users, to enable “browsing” through the collection, although it may be hard to accept that the term browsing was a staple of library science decades before the internet. Library leaders established eons ago the need for a system of physical organization to help readers peruse the book collection by topic, leading from the general to the specific.
You might ask what kind of help that was for finding the book on nutrition that was classified under “health science.” This is where another system, largely hidden from the public or often made annoyingly inaccessible, comes in. It is a system of categorization in which any content, book or otherwise, can be assigned an unlimited number of categories. Wondering through the stacks, one would never suspect this secret way of finding a nugget in a book about your favorite hobby if that book was classified to live elsewhere. The standard lists of terms for further describing books by multiple headings are called “subject headings” and you had to use a library catalog to find them. Unfortunately, they contain mysterious conventions called “sub-divisions,” designed to pre-coordinate any topic with other generic topics (e.g. Handbooks, etc. and United States). Today we would call these generic subdivision terms, facets. One reflects a kind of book and the other reveals a geographical scope covered by the book.
With the marvel of the Web page, hyperlinking, and “clicking through” hierarchical lists of topics we can click a mouse to narrow a search for handbooks on nutrition in the United States for better health beginning at any facet or topic and still come up with the book that meets all four criteria. We no longer have to be constrained by the Dewey model of browsing the physical location of our favorite topics, probably missing a lot of good stuff. But then we never did. The subject card catalog gave us a tool for finding more than we would by classification code alone. But even that was a lot more tedious than navigating easily through a hierarchy of subject headings, narrowing the results by facets on a browser tab and further narrowing the results by yet another topical term until we find just the right piece of content.
Taking the next leap we have natural language processing (NLP) that will answer the question, “Where do I find handbooks on nutrition in the United States for better health?” And that is the Holy Grail for search technology – and a long way from Mr. Dewey’s idea for browsing the collection.
Socialtext released Socialtext 3.0, a trio of applications including Socialtext People and Socialtext Dashboard, as well as a major upgrade to its Socialtext Workspace enterprise wiki offering. These products are built on a modular and integrated platform that delivers connected collaboration with context to individuals, workgroups, organizations and extranet communities. People are able to discover, create, and utilize social networks, collaborate in shared workspaces, and work productively, with personalized widget-based dashboards. The company also announced Socialtext Signals, a Twitter-style microblogging interface that goes beyond simple “tweets” by integrating both automated and manual updates with social networking context, expanding the company’s business communications offerings for the enterprise. As with its proven Workspace wiki and weblog product, Socialtext will make all of its offerings available on a hosted ASP as well as an on-premise appliance basis. The entire Socialtext 3.0 trio of products is available immediately on the hosted service, and will be made available to appliance customers starting in October 2008. Socialtext 3.0 profile integration with LDAP or Microsoft Active Directory systems enable rapid population. REST APIs for workspace and profile content are now complemented with a Widget architecture and user interface for the creation of enterprise mashups. Productized Connectors are available with Microsoft Sharepoint and IBM Lotus Connections. You can immediately experience this new release in a free trial at http://socialtext.com/
In our Multilingual Communications as a Business Imperative report, we noted the fact that machine translation (MT) has long been the target of “don’t let this happen to you” jokes throughout the globalization industry. Unpredictable results and poor quality allowed humor to become the focus of MT discussions, making widespread adoption risky at best.
On the other hand, we also noted that scientists, researchers, and technologists have been determined to unlock MT potential since the 1950’s to solve the same core challenges the industry struggles with today: cost savings, speed, and linguist augmentation. Although the infamous report on Languages and Machines from the Automatic Language Processing Advisory Committee (ALPAC) published in 1966 discussed these challenges in some depth (albeit from a U.S. perspective), it sent a resounding message that “there is no emergency in the field of translation.” Research funding suffered; researcher Margaret King described the impact as effectively “killing machine translation research in the States.”
Borrowing from S.E. Hinton, that was then, this is now. Technology advancements and pure computing power have made machine translation not only viable, but also potentially game-changing. A global economy, the volume and velocity of content required to run a global business, and customer expectations is steadily shifting enterprise postures from “not an option” to “help me understand where MT fits.” Case in point — participants in our study identified MT as one of the top three valuable technologies for the future.
There’s lots of game-changing news for our readers to digest.
- An excellent place to start is with our colleagues at Multilingual Magazine, who dedicated the April-May issue to this very subject. Don Osborn over at the Multidisciplinary Perspectives blog provides an excellent summary, posing the question: “Is there a paradigm shift on machine translation?”
- Language Weaver predicts a potential $67.5 billion market for digital translation, fueled by MT. CEO Mark Tapling explains why.
- SYSTRAN, one of the earliest MT software developers provides research and education here.
- And finally (for today), there’s no way to deny the Google impact — here’s their FAQ about the beta version of Google Translate. TAUS weighs in on the subject here.
Mary and I will be at Localization World Madison to provide practical advice and best practices for making the enterprise business case for multilingual communications investments as part of a Global Content Value Chain. But we’re also looking forward to the session focused on MT potential, issues, and vendor approaches. The full grid is here. Join us!
The Content Management Professionals Association (CM Pros) will once again be holding their annual Fall Summit in conjunction with Gilbane Boston in December. There are details over on our Events blog which I won’t duplicate here, or even better, go right to the source at http://summit.cmprofessionals.org/. If you are a member we hope to see you, and if you are not you can find out about joining on the CM Pros site at http://cmprofessionals.org/
11:00am PT / 2:00pm ET
Organizations are faced with critical knowledge management issues including knowledge capture, IP retention, search and discovery, and fostering innovation. The failure to properly address these issues results in companies wasting millions of dollars through inefficient information discovery and poor collaboration techniques. Today’s knowledge management systems must blend social media technologies with enterprise search, access, and discovery tools to give users a 360-degree view of their information assets. This blend is the foundation for new generation knowledge management.
Moderated by Andy Moore, Publisher of KMWorld Magazine, join Senior Analyst Leonor Ciarlone and Phil Green, CTO at Inmagic for a discussion on perspectives from Gilbane’s report on Collaboration and Social Media 2008, the power of Social Knowledge Networks, and an introduction to Inmagic® Presto.
Space is limited, register here!
MuleSource announced a collaboration with Intel Corporation to deliver a new offering that provides off-the-shelf integration between Mule and the Intel XML Software Suite. Called Mule Xpack for Intel XML Software Suite – the new offering is a set of instructions and Mule extensions that help to improve XML processing performance for SOA deployments. Taking a new approach to accelerating XML traffic, MuleSource teamed with Intel in a collaboration to bring the Intel XML Software Suite to the Mule ESB, enhancing and offloading XML processing. The Mule Xpack provides Mule integration support for the Intel XML Software Suite, which can be used to support three categories of XML operations: XML Parsing – reads XML documents and makes the data available for manipulation and processing to applications and programming languages; XSLT Transformation – facilitates efficient XML transformations in a variety of formats and can be applied to a full range of XML documents; XPath Evaluations – evaluates an XML Path (XPath) expression over an XML document DOM tree or a derived instance of source and returns a node, node set, string, number or Boolean value. Intel XML Software Suite is a software library providing APIs for C++ and Java on Linux and Windows operating systems, delivering performance for XML processing on industry standard servers and application environments. Designed to take advantage of the Intel Core microarchitecture, Intel XML Software Suite provides thread safe and efficient memory utilization, scalable stream-to-stream processing, and large XML file processing capabilities. http://www.muleforge.org/
The term taxonomy crept into the search lexicon by stealth and is now firmly entrenched. The very early search engines, circa 1972-73, presented searchers with the retrieval option of selecting content using controlled vocabularies from a standardized thesaurus of terminology in a particular discipline. With no neat graphical navigation tools, searches were crafted on a typewriter-like device, painfully typed in an arcane syntax. A stray hyphen, period or space would render the query un-computable, so after deciphering the error message, the searcher would try again. Each minute and each result cost money, so errors were a real expense.
We entered the Web search era bundling content into a directory structure, like the “Yellow Pages,” or organizing query results into “folders” labeled with broad topics. The controlled vocabulary that represented directory topics or folder labels became known as a taxonomic structure, with the early ones at NorthernLight and Yahoo crafted by experts with knowledge of the rules of controlled vocabulary, thesaurus development and maintenance. Google derailed that search model with its simple “search box” requiring only a word or phrase to grab heaps of results. Today we are in a new era. Some people like searching by typing keywords in a box, while others prefer the suggestions of a directory or tree structure. Building taxonomic structures for more than e-commerce sites is now serious business for searches within enterprises where many employees prefer to navigate through the terminology to browse and discover the full scope of what is there.
Taxonomies for navigation are but one purpose for them to be used in search. Depending on the application domain, richness of the subject matter, scope and depth of topics, these lists can become quite large and complex. The more cross-references (e.g. cell phones USE wireless phones) are embedded in the list, the more likely the searcher’s preferred term will be present. There is a diminishing return, however; if the user has to navigate to a system’s preferred term too often; the entire process of searching becomes unwieldy and abandoned. On the other hand, if the system automates the smooth transition from one term to another, the richness and complexity of a taxonomy can be an asset.
In more sophisticated applications of taxonomies, the thesaurus model of relationships becomes a necessity. When a search engine, has embedded algorithms that can interpret explicit term relationships, it indexes content according to a taxonomy and all its cross-references. Taxonomy here informs the index engine. It requires substantial maintenance and governance of a much more granular nature than for navigation. To work well, a large corpus of terminology needs to be built to assure that what the content says and means, and what the searcher expects are a match in results. If the results of a search give back unsatisfactory results due to a poor taxonomy, trust in the search system fails rapidly and the benefits of whatever effort was put into building a taxonomy are lost.
I bring this up because the intent of any taxonomy is the first step in deciding whether to start building one. Either model is an on-going commitment but the latter is a much larger investment in sophisticated human resources. The conditions that must be met to have any taxonomy succeed must be articulated in selling the project and value proposition.