The Gilbane Report: Volume 11, Number 9What's Next for XML and Enterprise Content Management?
December 2003
Download a PDF version of this article Read the news for this issue.
Since the annual XML Conference and Expo ( www.xmlconference.org/xmlusa ) takes place in November or December, we usually take the opportunity to end the calendar year with an update on what is going on with XML. This gathering remains the main XML event of the year, and is the best place to gauge what standards development activities are hot, what new approaches developers are adopting, and what businesses are doing with XML technologies.
Rather than review the event or specific announcements, we thought it would be more valuable to provide some insight into the XML trends we see having the most relevance for our readers in the coming year and beyond. XML is a fundamental technology that cuts across most, if not all, information technology applications, and fortunately, the silly debates about whether XML was for data or for content are well behind us. This bifurcation was only due to the primitive view of information processing software we were saddled with until recently. Nowadays, we can talk about XML's role in managing enterprise content' and it is widely understood that enterprise content' includes all kinds of data types and involves integration with many enterprise applications. As you will see from Bill's article this month, there are many areas where there are major changes ahead for IT strategies due to the continued evolution of both XML technology and related standards, and in our understanding of how they can be most effectively employed in our business applications.
What's Next for XML and Enterprise Content Management?
With the recent XML 2003 conference in Philadelphia , it is a good time to review XML and its role in enterprise content management (ECM). The show itself is always an excellent mix of the practical, the emerging, and the theoretical. The Gilbane Report's role in this year's show was to present a pre-conference tutorial on the state of the art in XML content management. This article takes a broader look at XML and ECMnot just looking at XML under management but at XML in all of its roles in content managementhow XML is used for content, for metadata, for integration, and for application design and deployment. This article also looks at both the state of the art and some emerging trends.
There are, of course, many things to say about XML and ECM. As Yogi Berra might say, XML isn't ubiquitous yet, but it sure is everywhere. A better baseball analogy might be to say, just as every baseball player can catch and throw, every ECM application can at least ingest and produce XML. That is, even if the internals of the ECM application manage non-XML or even proprietary data formats, the application needs to be able to take in and produce XML to share with other applications and processes. This XML-enabled integration with other applications is precisely the reason that ECM is a cornerstone technology. Long-time readers will recognize how much this differs from the many years that XML's predecessor, SGML, languished in relative obscurity. Even the organizations that were successful in implementing SGML-based publishing solutions found that these applications almost always stood as islands of automation. (And, of course, the great irony was that SGML was designed to overcome these very same islands of automation.)
Of course, XML itself is a (relatively) simple thinga meta-language that allows developers to create specific markup languages for documents and data; thus, the many vertical and specialized XML vocabulariesXBRL (eXtensible Business Reporting Language), DocBook for Technical Documentation, XrML for Digital Rights Management. Such vocabularies are important, and XML's ability to create valid, self-describing data is central to its usefulness. But XML has gone where SGML only dreamed of going because of the many supporting standards, technologies, and open-source and vendor offerings that have emerged in support of XML. This list begins with obvious things like XSLT and XPath, but continues down a long list to include RDF, RSS, native XML repositories such as Tamino, open source and standards-based technologies such as Cocoon and SVG, and vendor offerings such as Microsoft's InfoPath, Adobe's XMP, and on and on.
Against this backdrop, this article will look at the emerging XML trends and technologies that we feel are most relevant to ECM in the coming year. Some of these will be long-time and familiar topics that are evolving even as we write, and others will be new and emerging technologies that promise to have significant impact on ECM. In the course of this article, we corresponded with a number of people in the field who offered some of their thoughts as well.
XML and ECM: The Big Picture
XML has two general and important roles in content managementfor the encoding and management of the content itself (including the related metadata) and for the integration of the many component and related technologies that comprise and are related to content management. Lauren Wood wrote about this in a Gilbane Report ( Volume 10, Number 8 ) published to coincide with XML 2002 , and the big picture hasn't changed much. Indeed, the trends have become more pronounced and important. Thus, where Lauren wrote about XML and the plumbing of content management, we can point to another year and further growth in the use of XML for purposes such as data and application integration, syndication, and messaging. 2003 was also the year that XML and service-oriented architectures became a more viable and prevalent option.
Indeed, the use of XML for plumbing has grown even faster than the use of XML for the encoding and management of the content itself. A number of factors contribute to the relatively slow growth of XML for content representation:
- XML content still represents relatively specialized silos of information within an organizationfor example, technical documentation and product catalog data.
- Some of these silos may have historically been tagged in SGML and may still be working well, and organizations may not have yet felt compelled to convert the content to XML.
- Early content management applications focused on Web content, often article length and shorter, and typically stored in relational databases. Again, the conversion to XML has not been a crying need.
- Perhaps most importantly, the growth in content under management has been widespread and heterogeneous. All types of content are now under management, and XML has thus far proven to just be part of the mix.
We think that this trend is about to change. If the last few years represented a period of time when more content was born digital, the next several years will represent a time when more content will be born XML.
Some Trends in XML and Content Management
The following is by no means an exhaustive list of trends in XML, but it does point to several of the trends that are having the most impact specifically on content management. Several of these are long-time trends that continue to take shape as the use of XML for content management matures and evolves.
More Options for XML Content Creation
Microsoft plans to have something to say about this, of course. The newest version of Office gives us XML as a storage format for everyday tools like Word and Excel. It remains to be seen, of course, as to how much structured editing of XML will happen in a Microsoft Word environment, but it will be more than none. Given the staggering number of Word licenses and documents out there, it is safe to conclude that some amount of XML will emerge from the authoring done in Microsoft Word.
Moreover, there is a general trend to bring more structured authoring to the table. Products such as Ektron's EwebEditPro+XML and ArborText's Contribute bring XML authoring to browser-based forms, and HTML forms are giving way to XML-aware eForms technologies such as Microsoft's InfoPath and Adobe's Forms Designer. Add these toolsand Microsoft Officeto the list of dedicated XML authoring tools and you give organizations many more options for enabling business users to create structured content in the normal course of their work. Again, many of these approaches are relatively new, but these are positive developments.
As several correspondents noted, the mere fact that XML content can fall out of commercial off-the-shelf authoring tools is significant. The document server of the near future will contain, for example, more useful unstructured content such as Word documents containing XML and PDF documents with XML metadata.
More Options for XML Storage
Dedicated technology for storing and managing XML content has become both accepted and widespread in the recent past. Technologies such as Software AG's Tamino and Ixiasoft's TextML Server (among others) help organizations deal with the explosive growth in XML content and data, and along with it the need to provide developers with persistent access to the XML. At the same time, the dominant relational database vendors such as Oracle, IBM, and Microsoft are adding more specific and robust support for XML storage. Much as the rise in XML authoring options gives organizations more capabilities to extend structured authoring to more users, the growth in XML storage and management technologies give developers more options for managing XML at all stages of the content management and data integration process.
Does this greater availability of XML storage mean all content will end up in XML? In a word, no. While more content will be born XML, there is far too much legacy data in unstructured and semi-structured form for it to all be converted to XML efficiently and meaningfully. Moreover, unstructured and relational data will continue to be createdin abundancewithout a compelling business reason to make it all XML.
More Options for XML Transformation
XSLT is no longer a new thing. Nor is XSL-FO. Both transformation technologies are finding more uses everyday, as developers are using XML for more ad hoc data integration, reporting, and publishing activities. XSLT is especially prevalent among developers, and development platforms such as .Net and J2EE depend on the ongoing use and transformation of XML between processes and applications. Notably, Microsoft's new InfoPath initiative uses XSLT as the core technology for rendering eForms, favoring the more general transformation technology over a more specific rendering approach such as XForms.
XSLT is far enough along in its implementation to consider it a cornerstone technology for XML. Indeed, many development tools fully support XSLT with features such as code generation and validation. (These kinds of features are becoming more common for schema generation as well.) XSL-FO is not as widespread yet, but is growing in use. Part of the issue with XSL-FO is in determining precisely what problem it will solve. Will it be the primary means for rendering XML? Will it mainly be for output to print and viewable pages? And if it is for rendering pages, will it tackle all types of page rendering? Even complex pages such as those found in technical documents, journals, and catalogs? The next year will tell a lot, especially if more commercial and open source products emerge that use XSL-FO.
In addition to these standards, there are a number of commercial tools on the market that bring a great deal of added value to the content conversion problem, even as they rely heavily on core technologies such as XSLT. CambridgeDocs, for example, is one company that is focusing on the conversion problem as a key to ongoing use and management of XML in content management applications.
More Options for XML Data Modeling
Just as XSLT is no longer a new thing, neither are XML schemas. The early question seemed to be, when will schemas replace DTDs? Years later, many people are asking the same question. DTDs have proven to be a resilient technology. In some cases, this happens when an established application is working well and there is no overriding reason to change. In some cases, thoughespecially in document and publishing applicationspeople are still writing DTDs, or choosing existing DTDs for new applications.
This will change over time, as DTDs will continue to give way to schemas, despite the remaining issue as to which schema standard (XML vs. RELAX NG) will prevail. Schemas are simply a better technology, providing developers with stronger data typing and more ability to provide programmatic control and validation over the data. As one correspondent noted, there are simply too many smart people working on schemas at both the theoretical and practical level. All that brainpower is already resulting in better toolsautogeneration of XML schemas from example data and from relational databases, for example, and the additional validation that can be done with Schematron, to name a few.
More Open and Better Means of Content Assembly
The fundamental value of XML in content management has always been its ability to support repurposing and reuse of content. Assuming you are managing the XML-tagged content as some group of logical, reusable components, you are then able to reuse and republish the content in many forms and in many contexts. The actual assembly of the content from these components, though, has often been the function of the specific content management application you are using. Many systems, for example, have a build list metaphor that allows content to be assembled for Web or print publication. In many cases, this build list has been proprietary to the CMS. It would make more sense for content assembly and publishing to be based on open technology such as XML itself and XSLT. We heard from several correspondents who are using open source tools and XML-aware repositories to manage their content, and are beginning to use XML and XSLT for the document assembly process.
DRM, and Improved Naming and Linking of Objects
Digital Rights Management has been slowly re-emerging and gaining traction over the last two years. DRM has always been this great and interesting ideapersistent protection of content objects in both commercial publishing applications and enterprise ones. Unfortunately, the early vendor offerings were tied too closely with dot.com and e-book models of distribution. All the while, the much more interesting DRM problems have existed for major enterprises applications such as intellectual property management, confidential communications, and business and government intelligence. As a result, literally dozens of companies have come and gone, and DRM is still a largely unrealized application.
Yet certain companies and approaches have hung in there, notably Microsoft, but also smaller focused vendors such as Authentica and ContentGuard. ContentGuard, with backing from Microsoft, has been advancing an XML-centric approach to DRM called XrML, the Extensible Rights Markup Language. XrML has become the basis of a number of broad DRM initiatives, including those being advanced by MPEG, IEEE, and others.
Related to DRM are some ongoing efforts to formalize and improve on the persistent naming of objects. The Digital Object Identifier (DOI) is gaining traction in commercial publishing circles, especially among journal publishers, though we don't see much of this yet in enterprise applications. The combination of permanent identifiers, DRM, and native XML storage will be especially powerful over time.
More Use of Vertical Tag Sets
One of the reasons XML has been so successful in data integration has been the ready adoption of standardized tag sets by vendors and developers. For example, in arenas such as E-Commerce with ebXML and Electronic Data Interchange with EDI-XML. Standardized tag sets for content are not as widespread or prevalent, unless you count something like DocBook, which is in use but by no means is a dominant mechanism. There have also been some efforts to standardize Web content around XHTML, which strikes me as both obvious and perhaps too easy of an answer. That is to say, it makes perfect sense to render Web sites in XHTML, but does it also make sense to manage the content as XHTML?
One promising area is the adoption of specialized, detailed tag sets for important, long-living documents. Vertical areas such as financial, legal, and medical come to mind. Indeed, there are efforts to develop and use standardized tag sets for clinical trial data, court records, state and federal legislation, and certain types of scientific and medical content. The key, of course, will be having a critical mass of content actually reside in such tag sets.
An initiative such as the eXtensible Business Reporting Language (XBRL) could have significant impact. XBRL picks up where the established EDGAR reporting system leaves off, providing a means for companies to submit highly detailed and structured financial reports that can then be machine read and processed. The FDIC has begun using XBRL to have banks report performance and results, and the SEC now accepts (but has not mandated) quarterly financial reports from publicly traded companies. The value of XML, and a particular application such as XBRL, is to provide data that is unambiguously encoded and can be easily isolated and manipulated by other programs and processes. In the case of XBRL, financial models, accounting programs and other tools can use the encoded data to automate vital analyses that formerly required lots of time and effort to complete.
This kind of specialized content, encoded in XML, has even greater value and impact when one considers other technology such as Web services. Consider the power of large storehouses of specialized content, encoded in XML, and available via open protocols and processing standards.
Conclusions
While none of the trends in XML and content management are particularly flashy or ground-shaking, the clear trend is toward more ubiquitous use of XML for both content encoding and for the plumbing side of the content management problem. Taken together, the increased support in XML content authoring, storage, and transformation alone will provide substantial growth for the industry. Yet the greater impact and growth will come from some of the related trendsespecially those that will lead to more use of XML-encoded content in service-oriented architectures. Over time, such approaches will give organizations many and varied options for application development and improved efficiencies for internal and external users alike.
Bill Trippe, bill@gilbane.com
|