Recently in DITA - Darwin Information Typing Architecture Category

Have you ever waded through a massive technical manual, desperately searching for the section that actually applied to you? Or have you found yourself performing one search after another, collecting one-by-one the pieces of the answer you need from a mass of documents and web pages? These are all examples of the limitations of static publishing; that is, the limitations of publishing to a wide audience when people’s needs and wants are not all the same. Unfortunately, this classic “one size fits all” approach can end up fitting no one at all.

In the days when print publishing was our only option, and we thought only in terms of producing books, we really had no choice but to mass-distribute information and hope it met most people’s needs. But today, with Web-based technology and new XML standards like DITA, we have other choices.

DITA (Darwin Information Typing Architecture) is the hottest thing to have hit the technical publishing world in a long time. With its topic-based approach to authoring, DITA frees us from the need to think in terms of “books”, and lets us focus on the underlying information. With DITA’s modular, reusable information elements, we can not only publish across different formats and media – but also flexibly recombine information in almost any way we like.

Initial DITA implementations have focused primarily on publishing to pre-defined PDF, HTML and Help formats – that is, on static publishing. But the real promise of DITA lies in supporting dynamic, personalized content delivery. This alternative publishing model – which I’ll call dynamic content delivery – involves “pulling” rather than “pushing” content, based on the needs of each individual user.

In this self-service approach to publishing, end users can assemble their own “books” using two kinds of interfaces (or a hybrid of the two):

Information Shopping Cart – in which the user browses or searches to choose the content (DITA Topics) that she considers relevant, and then places this information in a shopping cart. When done “shopping”, she can organize her document’s table of contents, select a stylesheet, and automatically publish the result to HTML or PDF.

This approach is appropriate when users are relatively knowledgeable about the content, and where the structure of their output documents can be safely left up to them. Examples include engineering research, e-learning systems, and customer self-service applications.

Personalization Wizard – in which the user answers a number of pre-set questions in a wizard-like interface, and the appropriate content is automatically extracted to produce a final document in HTML or PDF.

This approach is appropriate for applications that need to produce a personalized but highly standard manual, such as a product installation guide or regulated policy manual. In this scenario, the document structure and stylesheet are typically preset.


In a hybrid interface, we could use a personalization wizard to dynamically assemble required material in a fixed table of contents – but then use the information shopping cart approach to allow the user to add supplementary material. Or, depending on the application, we might do the same thing but assemble the initial table of contents as a suggestion or starting point only. The first method might be appropriate for a user manual; the second might be better for custom textbooks.

Dynamic content delivery is made possible by the kind of topic-based authoring embraced by DITA. A topic is a piece of content that covers a specific subject, has an identifiable purpose, and can stand on its own (i.e., does not require a specific context in order to make sense). Topics don’t start with “as stated above” or end with “as further described below,” and they don’t implicitly refer to other information that isn’t contained within them. In a word, topics are fully reusable, in the sense that they can be used in any context where the information provided by the topic is needed.

The extraction and assembly of relevant topics is made possible by another relatively new standard called XQuery, which is able to both find the right information based on user profiles, filter the results accordingly, and automatically transform results into output formats like HTML or PDF. Of course, this approach is only feasible if the XQuery engine is extremely fast – which led us to build our own dynamic content delivery solution offering around Mark Logic, an XQuery-based content delivery platform optimized for real-time search and transformation.

The dynamic content delivery approach is an answer to the hunger for relevant, personalized information that pervades today’s organizations. Avoiding the pitfalls of the classic “one size fits all” publishing of the past, it instead allows a highly personalized and relevant interaction with “an audience of one.” I invite you to read more about this in a whitepaper I wrote that is available on our website (www.FlatironsSolutions.com).

The Future of DITA

| | Comments (1) | TrackBacks (0)

DITA (which stands for “Darwin Information Typing Architecture”) is the hottest new technology in the technical publishing market. While still early in its adoption cycle, it has the potential to become the future de facto standard for not only technical publishing, but for all serious content management and dynamic publishing applications. Whether this happens, however, will depend on the vision and creativity of the DITA standards committee, DITA vendors and DITA consultants.

While IBM originally designed DITA for technical documentation, its benefits are potentially transferable to encyclopedias, journal articles, mutual fund prospectuses, insurance policies, retail catalogs, and many, many other applications. But will it really be flexible enough to meet these other needs?

At Flatirons Solutions we’ve been testing the boundaries of DITA’s extensibility, taking DITA out of its comfort zone and thereby creating some interesting proof points for its flexibility. So far, the results are very positive. Four specific applications illustrate this:

* User personalized documentation – designed to support a variety of enterprise content libraries out of a single set of specializations, this application involved the use of 15 conditional processing attributes to drive dynamic production of personalized documents. An initial DocBook-based prototype was later re-designed for DITA.

* Scholarly research database – this solution involved marrying DITA with the venerable Text Encoding Initiative (TEI), a nearly 20 year old scholarly markup standard originally written in SGML. DITA was used to split the historical material into searchable topics; TEI provided the rigorous scholarly markup and annotations.

* Dynamic web publishing – designed for a large brokerage and business services firm, this application combines a single-source DITA-based authoring environment with an optimized dynamic processing pipeline that produces highly-personalized Web pages.

* Commercial publishing – we are currently exploring the use of DITA for encyclopedia, journal, and textbook publishing, for clients who have traditionally focused on print, but who are now also moving to increasingly sophisticated electronic products.

Of course, in pushing the boundaries we’ve also found issues. A classic example is the restriction in DITA’s “task” specialization that each step in a procedure must begin with a simple declarative statement. To make it as readable as possible, the procedure cannot begin with a statement that includes a list or multiple paragraphs or a table or a note. But what do you do if your content breaks these rules? DITA’s answer is that you rewrite your content.

Rewriting content is not unreasonable if you accept that you’re moving to DITA in order to adopt industry best practices. However, what if you don’t agree that DITA’s built-in “best practices” are the only way to write good content? Or what if you have 500,000 pages of legacy content, all of which need to be rewritten before they can conform to DITA? Would you still consider it practical?

You can solve this by making up your own “task” specialization, bypassing the constraints of the built-in “task” model. That’s an advantage of DITA. But if you do that, you’re taking a risk that you won’t be able to leverage future vendor product features based on the standard “task” specialization. And in other cases, such as limitations in handling print publishing, workarounds can be harder to find.

DITA 1.1 has made great progress toward resolving some of these issues. To be truly extensible, however, I believe that future versions of DITA will need to:

* Add more “out-of-the-box” specialization types which DITA vendors can build into their tools (for example, generic types for commercial publishing).
* Further generalize the existing “out-of-the-box” specialization types (for example, allowing more flexibility in procedure steps).
* Better handle packaging of content into published books, rather than focusing primarily on Web and Help output, and adapting this model for books.
* Simplify the means to incorporate reusable content, handle “variables” within text, and link to related content.

At conferences I’ve heard it suggested that if people don’t want to obey DITA’s particular set of rules, they should consider using another standard. I’ve even heard people say that DITA doesn’t need to focus on book publishing because print is “old school.” In my opinion, this kind of parochial thinking needs to be seriously reconsidered.

Today, DITA stands at the crossroads. If it can be aggressively generalized and extended to meet the needs of commercial publishers, catalog and promotional content, and financial services and other vertical industry applications, then it has the chance to be “the” standard in XML-based dynamic publishing. If this doesn’t happen, DITA runs the risk of being relegated to a relatively elite technical publishing standard that’s only useful if you meet its particular set of assumptions and rules.

As an industry, which way will we go?

About this Archive

This page is a archive of recent entries in the DITA - Darwin Information Typing Architecture category.

CTO Blog News is the previous category.

ECM - Enterprise Content Management is the next category.

Find recent content on the main index or look in the archives to find all content.

DITA - Darwin Information Typing Architecture: Monthly Archives

Gilbane Boston 2008 conference banner

Now available! "Beyond Search: What to do When Your Enterprise Search System Doesn't Work, by Stephen Arnold

Beyond Search Report cover

Gilbane Links

NewsShark

Sign-up for our weekly NewsShark newsletter.
Content technology industry news without the hype:

* Email

* First Name

* Last Name

* = Required Field