Recently in XQuery Category

As part of next week's Gilbane Boston Conference, the XML practice will be delivering a pre-conference workshop, "Managing Smart Content: How to Deploy XML Technologies across Your Organization." The instructors will be Geoff Bock, Dale Waldt, Bill Trippe, Barry Schaeffer and Neal Hannon--a group of experts that represents decades of technical and management experience on XML initiatives.

A tip of the virtual hat to Senior Analyst Geoff Bock for organizing this.

If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:

"Humans are capable of using the Web to carry out tasks such as finding the Finnish word for "monkey", reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web." (http://en.wikipedia.org/wiki/Semantic_Web).

To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.

We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don't see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow.

A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.

Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).

Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content's meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable "Tower of Babel" with many similar, but not fully interoperable data models.

The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?

This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee's vision of the semantic Web (see http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2009-01-07.html/ for a recent paper on the subject).

Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>

We are wrapping up our project with JustSystems. In total, we created three papers, three companion webinars, and the the interactive ROI blueprint. I will also be producing a podcast shortly with Bruce Sharpe, JustSystem's Founding Technologist. You can download the papers and view the recorded webinars here (registration required).

A tip of the hat to Gilbane colleagues Geoffrey Bock, Mary Laplante, and Dale Waldt who did most of the work. It was a big project!

Speaking of Resources

user-pic
Vote 0 Votes  

There's an interesting discussion about XML repositories going on over at the XQuery Talk mailing list at Stylus Studio's website. Also, if you are interested in XML repositories, the best publicly available deep-dive is over at Ron Bourret's site.

Here and There

user-pic
Vote 0 Votes  

Well as the name of the blog suggests, the focus is on both technology and strategy. We have been at this long enough to come to the stunning conclusion that technology adoptions minus well-thought-out and sound business strategies are doomed to failure. (Now you know why we get paid the big bucks!) While obvious, the conclusion is also true. I have had the luxury of consulting with several clients over a long period of time. I like to think they are successful because they listen to me (and they do), but the bigger reason they are successful is that they use my input to inform well thought out business strategies. Sometimes these are operational (they want to save money, improve efficiency), but more often they are about the top line. They want to drive more revenue.

For commercial publishers this means bringing more product to market more quickly, customizing products, and developing derivative products (think of offerings like SafariU). For enterprises, this is also tied to bringing more product to market more quickly; think of a company like Autodesk using XML-based publishing and globalization to bring more products to more markets simultaneously. These are world-class projects based on XML that are bringing incredible value to their organizations, but--even more significantly--these are not the only efforts of their kind. Whereas in the early days of SGML the community could count projects of this type in perhaps the low double digits, I have long ago given up on trying to remember or catalog how many of these projects are out there backed by XML technology.

But not every project is as successful as the ones I have cited. Indeed these stand out as case studies of the best practices. Projects do fail and projects do falter, and I will reveal my bias here in saying that, especially in the recent few years, few projects fail because of the chosen technology. (All complex systems require significant customization! Who would have thunk it?) Much more often they fail because of problems with project management, lack of sufficient staffing, and shifting plans and execution when the inevitable problems arise. And these kinds of failures come right back to a failure in strategy, or at least a failure in realistically planning for a complex undertaking that is critical to organizational success.

So content strategies are indeed a critical half of this practice, but technology is the other half. Here as well a comparison to SGML is in order. Whereas in the days of SGML there were few vendors at the table, now there are literally scores. Even more importantly, none of the major vendors are missing, and one can make the argument that the major vendors are--or soon will be--the dominant players in the market. XML is central to the product and development platform strategies of Microsoft, Oracle, IBM, Sun, Adobe, and EMC. One can only speculate at the level of R&D dedicated to XML at these companies, but it is safe to say it is a lot. Just as impressive is the community of developers who work in XML daily. Most programmers working in contemporary languages like Java and C# use XML for all kinds of routine tasks, and XML data mapping and modeling tools are built into Visual Studio and many other development tools.

To be more specific, we intend to cover what we categorize as the range of XML products of most interest to business and IT professionals responsible for content management initiatives. These include:


  • XML Repositories

  • XML Content Management Platforms

  • XML Editors

  • XML Transformation and Publishing Tools

  • XML Utilities, Middleware, and IDEs

  • XML Forms

Among other things, we will be developing an online directory of these product lines (more on that in a future post). I have been informally cataloging the companies over the recent few weeks and I already have 60 to 70 companies without trying very hard. We expect to interact with these vendors, get details of the products and product roadmaps, and also work with them when appropriate on product strategy and projects like white papers and case studies.

So that's the news so far from here. Do get in touch if you have any questions, ideas, or complaints!

Bill's latest Tweet

NewsShark

Sign-up for our weekly NewsShark newsletter.
Content technology industry news without the hype:

* Email

* First Name

* Last Name

* = Required Field