Curated content for content, computing, and digital experience professionsals

Day: July 31, 2007

DITA and Dynamic Content Delivery

Have you ever waded through a massive technical manual, desperately searching for the section that actually applied to you? Or have you found yourself performing one search after another, collecting one-by-one the pieces of the answer you need from a mass of documents and web pages? These are all examples of the limitations of static publishing; that is, the limitations of publishing to a wide audience when people’s needs and wants are not all the same. Unfortunately, this classic “one size fits all” approach can end up fitting no one at all.

In the days when print publishing was our only option, and we thought only in terms of producing books, we really had no choice but to mass-distribute information and hope it met most people’s needs. But today, with Web-based technology and new XML standards like DITA, we have other choices.

DITA (Darwin Information Typing Architecture) is the hottest thing to have hit the technical publishing world in a long time. With its topic-based approach to authoring, DITA frees us from the need to think in terms of “books”, and lets us focus on the underlying information. With DITA’s modular, reusable information elements, we can not only publish across different formats and media – but also flexibly recombine information in almost any way we like.

Initial DITA implementations have focused primarily on publishing to pre-defined PDF, HTML and Help formats – that is, on static publishing. But the real promise of DITA lies in supporting dynamic, personalized content delivery. This alternative publishing model – which I’ll call dynamic content delivery – involves “pulling” rather than “pushing” content, based on the needs of each individual user.
In this self-service approach to publishing, end users can assemble their own “books” using two kinds of interfaces (or a hybrid of the two):

  • Information Shopping Cart – in which the user browses or searches to choose the content (DITA Topics) that she considers relevant, and then places this information in a shopping cart. When done “shopping”, she can organize her document’s table of contents, select a stylesheet, and automatically publish the result to HTML or PDF.
    This approach is appropriate when users are relatively knowledgeable about the content, and where the structure of their output documents can be safely left up to them. Examples include engineering research, e-learning systems, and customer self-service applications.
  • Personalization Wizard – in which the user answers a number of pre-set questions in a wizard-like interface, and the appropriate content is automatically extracted to produce a final document in HTML or PDF. This approach is appropriate for applications that need to produce a personalized but highly standard manual, such as a product installation guide or regulated policy manual. In this scenario, the document structure and stylesheet are typically preset.

In a hybrid interface, we could use a personalization wizard to dynamically assemble required material in a fixed table of contents – but then use the information shopping cart approach to allow the user to add supplementary material. Or, depending on the application, we might do the same thing but assemble the initial table of contents as a suggestion or starting point only. The first method might be appropriate for a user manual; the second might be better for custom textbooks.

Dynamic content delivery is made possible by the kind of topic-based authoring embraced by DITA. A topic is a piece of content that covers a specific subject, has an identifiable purpose, and can stand on its own (i.e., does not require a specific context in order to make sense). Topics don’t start with “as stated above” or end with “as further described below,” and they don’t implicitly refer to other information that isn’t contained within them. In a word, topics are fully reusable, in the sense that they can be used in any context where the information provided by the topic is needed.

The extraction and assembly of relevant topics is made possible by another relatively new standard called XQuery, which is able to both find the right information based on user profiles, filter the results accordingly, and automatically transform results into output formats like HTML or PDF. Of course, this approach is only feasible if the XQuery engine is extremely fast – which led us to build our own dynamic content delivery solution offering around Mark Logic, an XQuery-based content delivery platform optimized for real-time search and transformation.

The dynamic content delivery approach is an answer to the hunger for relevant, personalized information that pervades today’s organizations. Avoiding the pitfalls of the classic “one size fits all” publishing of the past, it instead allows a highly personalized and relevant interaction with “an audience of one.” I invite you to read more about this in a whitepaper I wrote that is available on our website (

The Marginal Influence of E-commerce Search and Taxonomies on Enterprise Search Technologies

As we gear up for Gilbane Boston 2007, the number of possible topics to include in the tracks related to search seems boundless. The search business is in a transitional state but in spite of disarray is still pivotal in its impact on business and current culture. The sessions will reflect the diversity in the market.

One trend is quite clear; the amount of money and effort being expended for Web search or site search on commercial Web sites is a winner in the “search technology” revenues war with annual revenues measuring well into the $billions. On the other hand, a recent Gartner study described the 2006 revenues for enterprise search as below $400M. This figure comes from reading an excellent article, Enterprise Search: Seek and Maybe You’ll Find, by Ben DuPont in Intelligent Enterprise. Check it out.

The distinctions between search on the Web and search within the enterprise are numerous but here are two. First, Internet Web search revenue is all about marketing. Yes, we use it to discover, learn, find facts, and become more informed. But when companies supplying search technology to expose you to their content on the Internet they do so to facilitate commerce. If it falls into the hands of organizations that have other intent, libraries or government agencies, so be it.

As we all know, when we are at work, seeking to discover, learn or find facts to do our jobs better, we need a different kind of search. Thus, we seek a clear search winner built just for our enterprise with all of its idiosyncrasies. The problem is that what is inside does not look like the rest of the world’s content as it is aggregated for commercial views. Enterprises are unique and operate sometimes chaotically, or, at best, with nuanced views of what information is most important.

The second distinction relates to taxonomies, and the increase in their development and use. I’ve seen a dramatic increase in job postings for “taxonomists” and have managed several projects for enterprises over the years to build these controlled lists of terms for categorizing content. What is noteworthy about recent job opportunities is that most seem to be for customer facing Web sites. Historically, organizations with substantial internal content (e.g. research reports, patents, laboratory findings, business documents) hired professionals to categorize materials for a narrowly defined audience of specialists. The terminology was often highly unique, could number in the hundreds or thousands of terms, even for a relatively small enterprise. This is no longer a common practice.

Slow financial growth in enterprise search markets is no surprise. Like many tools designed and marketed for departments not directly tied to revenue generation, search goes begging for solid vertical markets. Search’s companion technologies are also struggling to find a lucrative toehold for use within the organization. Content management systems integrated with rich and efficient taxonomy building and maintenance functions are hard to find.

I am confident that tools in CMS products for building and maintaining complex taxonomies will not improve until enterprises find a solid business reason to put professional human resources into doing content management, taxonomy development, search, and text analytics on their most important knowledge assets. This is a tough business proposition compared to the revenues being driven on the Internet. What businesses need to keep in mind is that without the ability to leverage their internal knowledge content assets better, smarter and faster, there won’t be innovative products in the pipeline to generate commerce. Losing track of your valuable intellectual resources is not a good long term strategy. Once you begin committing to solid content resource management strategies, enterprise technology products will improve to meet your needs.