Curated for content, computing, data, information, and digital experience professionals

Category: Web technologies & information standards (Page 38 of 58)

Here we include topics related to information exchange standards, markup languages, supporting technologies, and industry applications.

XML and Office 2.0

WIth Carl’s recent post on SaaS, and John Newton’s “Content Management 2.0” discussion, I thought I’d throw this into the mix… recently there has also been a flurry of activity around a concept called “Office 2.0” – another offshoot of the term “Web 2.0” – in which all traditional office applications can be replaced by online services accessible through a generic web browser.

What’s making this possible is a set of new technologies including AJAX, RSS and web services, a set of actual applications such as Google’s gmail and ZOHO’s “online” word processor, and a great deal of unbridled enthusiasm.

Since Office 2.0 is particularly aimed at applications that affect business and larger enterprises, I’d like to take a quick look at how well it fits the needs of such enterprises, and then suggest how it might be extended to better meet these needs.

But first, I’d like to point out that it’s easy to get caught up in the details of technologies like AJAX and RSS, and miss the bigger picture. I would propose that the real excitement is in the vision enabled by the technology, as opposed to the technology itself. To not see this leads to the inevitable “religious wars” around specific tools, which we of course want to avoid…

To put this in perspective, Office 2.0 reminds me of what happened with CD-ROM twenty years ago. I still vividly recall a colleague of mine proudly announcing that he was going to the world’s first international CD-ROM conference, which he described as the “Woodstock” of the computer industry. He simply couldn’t contain his excitement about this pivotal event. But then, I remember him suddenly changing his facial expression, looking at me wryly and saying, “well of course, CD-ROM is actually only a storage medium…can you imagine me being excited about going to a floppy disk conference?”

Twenty years later, we might well ask the same thing. CD-ROM has become about as mundane as floppy disks were then. But at the time, CD-ROM represented much more than a new storage medium. Instead, it symbolized the sudden freedom to access and search information – right from your own desktop – that would otherwise be virtually inaccessible. It was in fact, the first glimpse of the kind of mass interconnectivity that the World Wide Web would later provide.

Office 2.0 is much like that – it represents freedom from the tyranny of desktop applications and proprietary data locked up on individual computers. It heralds a new age of unfettered collaboration and information sharing within enterprises.

So what are the key things that are exciting about Office 2.0, and do its maxims and rules actually fit larger enterprises? I think the answer is a tentative “yes” – at least at a conceptual level. And at least so long as the Office 2.0 folks are willing to make a few compromises and entertain some crucial extensions.

To explore this further, let’s go through the official Office 2.0 rules one by one…

#1 – No client application other than a web browser. Actually, this the holy grail of nearly all corporate IT departments, because one of the biggest headaches in IT is trying to keep all the client applications up to-date on individual computers. In practice, we’d have to accommodate situations where a high-speed Internet connection is not available, but I would grant that this is increasingly the exception.

#2 – No files on your personal computer. In principle, this is the entire thrust of enterprise content management initiatives, taking information that’s buried on people’s “C:” drives and getting into a managed and accessible central repository. So far, so good.

#3 – No dependence on any particular vendor.This is another mantra of corporate IT, expressing itself in the current fervor over Software as a Service and Service-Oriented Architectures, ideally with plug-and-play vendor apps encapsulated in generic web services interfaces.

#4 – Collaboration through document sharing and publishing. Again, this a winner with big enterprises. In fact, this is most of what my company, Flatirons Solutions, does for a living. And from the overall perspective of Web 2.0, I might add that wikis and blogs are an increasingly popular way to share ideas and knowledge within larger organizations, supplementing the sharing and publishing of documents.

#5 – Syndication in addition to peer-to-peer collaboration. This is another focus of enterprise content management, allowing people to subscribe to documents or content that has changed or is newly-published. And RSS syndication is increasingly one of the key channels to which we find ourselves publishing content.

#6 – Seamless data import/export across services. This is a fundamental objective of all enterprise content management initiatives, but now comes the rub. The current Office 2.0 vision thinks of sharing in terms of “interchangeable” formats like .DOC, HTML and PDF. But .DOC is a common but still proprietary vendor format, and HTML and PDF are really only sharable at the visible level. In other words, HTML and PDF let you display and print each other’s information, but not actually interchange the underlying source data and information in a way a computer can process and transform.

Proprietary word processing seems less proprietary when it’s on the Web, but if you really want interchangeability between services, you need to be using a vendor, format and media-neutral standard like XML. XML does not assume a particular vendor, nor does it assume web or print as the output medium. Instead, it encodes the information itself in a completely neutral form, from which media-specific formats like HTML and PDF can be derived.

In the work we do with large enterprises, XML also provides the key to sharing information at a much deeper level than “documents.” When we look at the set of documents that people need to share and publish, we see that there is often a tremendous amount of redundancy. If this overlapping information is authored and maintained independently, there are huge problems with inconsistency, and a lot of unnecessary time and cost maintaining and reconciling the multiple versions.

XML allows source information to be “chunked up” into the underlying building blocks, and from there flexibly mixed-and-matched to create the full array of print and Web-based documents. Individuals can collaborate on the source building blocks – without needing to assume a particular assembled document or output medium – and then combine the building blocks of interest into the documents they produce. Furthermore, if these reusable building blocks are structured as standalone “topics”, they can be directly published and syndicated outside the context of a higher-level document or web page. We call this “single source” publishing – because underlying content is maintained once, and then reused many times.

So, is Office 2.0 the right idea for larger enterprises? Perhaps, in principle…but to make it really work we need to merge its vision with the significant work already going on in single-source XML-based publishing. Then we’d have the potential for a real winner.

The Future of DITA

DITA (which stands for “Darwin Information Typing Architecture”) is the hottest new technology in the technical publishing market. While still early in its adoption cycle, it has the potential to become the future de facto standard for not only technical publishing, but for all serious content management and dynamic publishing applications. Whether this happens, however, will depend on the vision and creativity of the DITA standards committee, DITA vendors and DITA consultants.

While IBM originally designed DITA for technical documentation, its benefits are potentially transferable to encyclopedias, journal articles, mutual fund prospectuses, insurance policies, retail catalogs, and many, many other applications. But will it really be flexible enough to meet these other needs?

At Flatirons Solutions we’ve been testing the boundaries of DITA’s extensibility, taking DITA out of its comfort zone and thereby creating some interesting proof points for its flexibility. So far, the results are very positive. Four specific applications illustrate this:

  • User personalized documentation – designed to support a variety of enterprise content libraries out of a single set of specializations, this application involved the use of 15 conditional processing attributes to drive dynamic production of personalized documents. An initial DocBook-based prototype was later re-designed for DITA.
  • Scholarly research database – this solution involved marrying DITA with the venerable Text Encoding Initiative (TEI), a nearly 20 year old scholarly markup standard originally written in SGML. DITA was used to split the historical material into searchable topics; TEI provided the rigorous scholarly markup and annotations.
  • Dynamic web publishing – designed for a large brokerage and business services firm, this application combines a single-source DITA-based authoring environment with an optimized dynamic processing pipeline that produces highly-personalized Web pages.
  • Commercial publishing – we are currently exploring the use of DITA for encyclopedia, journal, and textbook publishing, for clients who have traditionally focused on print, but who are now also moving to increasingly sophisticated electronic products.

Of course, in pushing the boundaries we’ve also found issues. A classic example is the restriction in DITA’s “task” specialization that each step in a procedure must begin with a simple declarative statement. To make it as readable as possible, the procedure cannot begin with a statement that includes a list or multiple paragraphs or a table or a note. But what do you do if your content breaks these rules? DITA’s answer is that you rewrite your content.

Rewriting content is not unreasonable if you accept that you’re moving to DITA in order to adopt industry best practices. However, what if you don’t agree that DITA’s built-in “best practices” are the only way to write good content? Or what if you have 500,000 pages of legacy content, all of which needto be rewritten before they can conform to DITA? Would you still consider it practical?

You can solve this by making up your own “task” specialization, bypassing the constraints of the built-in “task” model. That’s an advantage of DITA. But if you do that, you’re taking a risk that you won’t be able to leverage future vendor product features based on the standard “task” specialization. And in other cases, such as limitations in handling print publishing, workarounds can be harder to find.

DITA 1.1 has made great progress toward resolving some of these issues. To be truly extensible, however, I believe that future versions of DITA will need to:

  • Add more “out-of-the-box” specialization types which DITA vendors can build into their tools (for example, generic types for commercial publishing).
  • Further generalize the existing “out-of-the-box” specialization types (for example, allowing more flexibility in procedure steps).
  • Better handle packaging of content into published books, rather than focusing primarily on Web and Help output, and adapting this model for books.
  • Simplify the means to incorporate reusable content, handle “variables” within text, and link to related content.

At conferences I’ve heard it suggested that if people don’t want to obey DITA’s particular set of rules, they should consider using another standard. I’ve even heard people say that DITA doesn’t need to focus on book publishing because print is “old school.” In my opinion, this kind of parochial thinking needs to be seriously reconsidered.

Today, DITA stands at the crossroads. If it can be aggressively generalized and extended to meet the needs of commercial publishers, catalog and promotional content, and financial services and other vertical industry applications, then it has the chance to be “the” standard in XML-based dynamic publishing. If this doesn’t happen, DITA runs the risk of being relegated to a relatively elite technical publishing standard that’s only useful if you meet its particular set of assumptions and rules.

As an industry, which way will we go?

Look ahead a bit: here, here, and here.

Microsoft Announces Support for ODF Translation tools

I could have sworn they already announced this, but in any case it was inevitable. The whole controversy is now simply not all that interesting. IT organizations need to understand the translation issues, but choosing one format over another is just not that big a deal. Many organizations have more complex issues to deal with, like integrating XML content from custom applications or other enterprise apps that don’t map to either ODF or Open XML directly. We have lots more background on this.

WinFS and Project Orange at Tech-Ed in Boston

Since we have our conference on Content Technologies for Government in Washington this week I probably will not get to Tech-Ed which is at our new convention center here in Boston, even though it is less than 2 blocks away. But if I had the time, I would be there scouting out the new WinFS beta and the intriguing Project Orange, (which may be relevant to the previous post on Viper). Mary Jo Foley has a list of the top 10 things to watch for there. She and others have pointed to this post for some clues on Project Orange.

Microsoft’s XPS to compete with Adobe’s PDF

As this news item reminded us today, vendors are gearing up for the launch of Vista and Office 12. We are already seeing vendors announcing support for both in various ways, but this will continue to build to a deluge of announcements over the next 6 months. XPS (XML Paper Specification) is one of the new pieces of Vista and Office 12 that bears paying attention to. While it is not likely to displace Adobe’s PDF (certainly not in the near term at least), it will certainly be used instead of PDF for certain applications. What those applications will be is something worth thinking about. There is more info on XPS from Microsoft here, including links to the specification, developer blogs etc.

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑