Traditionally, publishing is a pushy process. When I have something to say, I write it down. Perhaps I revise it, check with colleagues, and verify my facts with appropriate authorities. Then I publish it, and move on to the next thing – without directly interacting with my audience and stakeholders. Whether I distribute the content electronically or in a hard copy format, I leave it to my readers to determine the value of whatever I publish.
However, as we describe in our recently completed report Smart Content in the Enterprise, XML applications can transform this conventional publishing paradigm. By smart content, we mean content that is granular at the appropriate level, semantically rich, useful across applications, and meaningful for collaborative interaction.
From a business perspective, smart content adds value to published information in new and compelling ways. Let’s consider the experiences of NetApp and Warrior Gateway, two of the organizations featured in our report.
NetApp
As a provider of storage and data management solutions, NetApp has invested a lot of time and effort embracing DITA and restructuring its technical documentation. By systematically tagging and managing content components, and by focusing on the underlying content development processes, writers and editors can keep up with the pace of product releases.
But there is more to this publishing process orientation. Beyond simply producing product information faster and cheaper, NetApp is poised to make publishing better. The company can now easily support its reseller partners by providing them with the DITA tagged content that they can directly incorporate into their own OEM solutions. Resellers' customers get just the information they need, directly from the source. With its XML application, NetApp incorporates its partners and stakeholders into its information value chain.
Warrior Gateway
As a content aggregator, Warrior Gateway collects, organizes, enriches, and redistributes content about a wide range of health, welfare, and veteran-related services to soldiers, veterans, and their families. Rather than simply compiling an online catalog of service providers’ listings, Warrior Gateway restructures the content that government, military, and local organizations produce, and enriches it by adding veteran-related categories and other information. Furthermore, Warrior Gateway adds a social dimension by encouraging contributions from veterans and family members.
Once stored within the XML application powering Warrior Gateway, the content is easily reorganized and reclassified to provide the veterans’ perspective about areas of interest and importance. Volunteers working with Warrior Gateway can add new categories when necessary. Service providers can claim their profile and improve their own data details. Even the public users can contribute to content to the gateway, a crowd sourcing strategy to efficiently collect feedback from users. With contributions from multiple stakeholders, the published listings can be enriched over time without requiring a large internal staff to add the extra information.
Capturing New Business Value
There’s a lot more detail about how the XML applications work in our case studies – I recommend that you check them out.
What I find intriguing is the range of promising and potentially profitable business models engendered by smart content. Enterprise publishers have new options and can go beyond simply pushing content through a publishing process. Now they can build on their investments, and capture the pull of content value.
Recently in Web 2.0, Enterprise 2.0 etc. Category
With the rise of Web 2.0 and 3.0, growing Internet traffic, social networking and a host of other technologically driven applications and appetities, government at all levels is confronting the burgeoning changes in its role and participation in the society around it.
An important part of this process is the separation of the paths down which technology is taking society at large from the paths government should and should not follow in performing its essential functions. Experience has shown that not every tool, functionality and resource available to and used by citizens should become part of the governance process. The quandry is deciding up front which is which. This quandry can be seen in the very definition of government being used to described the future: "connected government", "open government", "participatory democracy", "transparent government" are just some of the terms being used to describe what their users think government should be.
The core challenge, it would seem, is to develop an approach that makes government at once more effective in discharging its myriad day to day duties, more open and responsive to the honestly held beliefs and concerns of its citizens, yet still fully capable of discharging its constitutional responsibilities without infringing on or abrogating the rights of its citizens. History shows that this:
- Will not be an easy process
- Will not lend itself to a solution based solely on availablle technnology
- Is likely to be tried unsuccessfully (or disastrously) more than once before we get it right.
This would seem to dictate that, whatever the technological imperatives, government should be changed carefully, in small steps and with well-considered fallbacks from the paths that turn out to be ineffective or dangerous to our liberties. One way to do this, for instance, would be to focus on those government functions we know are broken and understand how to fix (yes, there are such things.) Then we could focus on applying new technology in areas where the target is familiar, the outcome more easily measured and the impact is less likely to spin out of control.
O'Reilly's Tools of Change conference in New York City this week was highly successful, both inside and outside the walls of the Marriott Marquis. The sessions were energetic, well-attended, and--on the whole--full of excellent insight and ideas about the digital trends taking a firm hold of nearly all sectors of the publishing business. Outside the walls, especially on Twitter, online communities were humming with news and commentary on the the conference. (You almost could have followed the entire conference just by following the #toc hash tag at Twitter and accessing the online copies of the presentations.)
But if you had done that, you would have missed the fun of being there. There were some superb keynotes and some excellent general sessions. Notable among the keynotes were Tim O'Reilly himself, Neelan Choksi from Lexcycle (Stanza), and Cory Doctorow. The general sessions covered a fairly broad spectrum of topics but were heavy on eBooks and community. Because of my own and my clients' interests, I spent most of my time in the eBook sessions. The session eBooks I: Business Models and Strategy was content-rich. To begin with, you heard straight from senior people at major publishers with significant eBook efforts (Kenneth Brooks from Cengage Learning, Leslie Hulse from Harper Collins Publishers, and Cynthia Cleto from Springer Science+Business Media). Along with their insight, the speakers--and moderator Michael Smith from IDPF--assembled an incredibly valuable wiki of eBook business and technical material to back up their talk. I also really enjoyed a talk from Gavin Bell of Nature, The Long Tail Needs Community, where he made a number of thoughtful points about how publishers need to think longer and harder about how reading engages and changes people and specifically how a publisher can build community around those changes and activities.
There were a few soft spotsin the schedule. Jeff Jarvis' keynote, What Would Google do with Publishing?, was more about plumping his new book (What Would Google Do?) than anything else, but was also weirdly out of date, even though the book is hot off the presses, with 20th century points like "The link changes everything" and "If you're not searchable, you won't be found." (Publishers are often, somewhat unfairly, accused of being Luddite, but they are not that Luddite.) There were also a couple of technical speakers who didn't seem to make the necessary business connections to the technical points they were making, which would have been helpful to those members of the audience who were less technical and more publishing-product and -process oriented. But these small weaknesses were easily outshone by the many high points, the terrific overall energy, and the clear enthusiasm of the attendees.
One question I have for the O'Reilly folks is to ask how they will keep the energy going. They have a nascent Tools of Change community site. Perhaps they could enlist some paid community managers to seed and moderate conversations, and also tie community activities to other O'Reilly products such as the books and other live and online events.
O'Reilly has very quickly established a very strong conference and an equally strong brand around the conference. With the publishing industry so engulfed in digital change now, I have to think this kind of conference and community can only continue to grow.
Yesterday the big stimulus bill cleared the conference committee that resolves the Senate and House versions. If you remember your civics that means it will be likely to pass in the chambers and then be signed into law by the president.
Included in the bill are billions of dollars for digitizing important information such as medical records or government information. Wow! That is a lot of investment! The thinking is that inaccessible information locked in paper or proprietary formats cost us billions each year in productivity. Wow! That's a lot of waste! Also, that access to the information could spawn a billions of dollars of new products and services, and therefore income and tax revenue. Wow! That's a lot of growth!
Many agencies and offices have striven to expose useful official information and reports at the federal and state level. Even so, there is a lot of data still locked away, or incomplete or in difficult to use forms. A while ago a Senate official once told me that they do not maintain a single, complete, accurate, official copy of the US Statutes internally. Even if this is no longer true, the public often relies on the "trusted" versions that are available only through paid online services. Many other data types, like many medical records, only exist in paper.
There are a lot of challenges, such as security and privacy issues, even intellectual property rights issues. But there are a lot of opportunities too. There are thousands of data sources that could be tapped into that are currently locked in paper or proprietary formats.
I don't think the benefits will come at the expense of commercial services already selling this publicly owned information as some may fear. These online sites provide a service, often emphasizing timeliness or value adds like integrating useful data from different sources, in exchange for their fees. I think a combination of free government open data resources and delivery tools, plus innovative commercial products will emerge. Maybe some easily obtained data may become commoditized, but new ways of accessing and integrating information will emerge. The big information services probably have more to fear from startups than from free government applications and data.
As it happens, I saw a demo yesterday of a tool that took all the activity of a state legislature and unified it under one portal. This allows people to track a bill and all related activity in a single place. For free! The bill working its way through both chambers is connected to related hearing agendas and minutes, which are connected to schedules, with status and other information captured in a concise dashboard-like screen format (there are other services you can pay for which fund the site). Each information component came from a different office and was originally in it's own specialized format. What we were really looking at was a custom data integration application done with AJAX technology integrating heterogeneous data in a unified view. Very powerful, and yet scalable. The key to its success was strong integration of data, the connections that were used to tie the information together. The vendor collected and filtered the data, converted to a common format, added the linkage and relationship information to provide an integrated view into data. All source data is stored separately and maintained by different offices. Five years ago it would have been a lot more difficult to create the service. Technology has advanced, and the data are increasingly available in manageable forms.
The government produces a lot of information that affect us daily that we, as taxpayers and citizens, actually own, but have limited or no access to. These include statutes and regulations, court cases, census data, scientific data and research, agricultural reports, SEC filings, FDA drug information, taxpayer publications, forms, patent information, health guidelines, etc., etc., etc. The list is really long. I am not even scratching the surface! It also includes more interactive and real-time data, such as geological and water data, whether information, and the status of regulation and legislation changes (like reporting on the progress of the stimulus bill as it worked it way through both chambers). All of these can be made more current, expanded for more coverage, integrated with related materials, validated for accuracy. There are also new opportunities to open up the process of using forums and social media tools for collecting feedback from constituents and experts (like the demo mentioned above). Social media tools may both give people an avenue to express their ideas to their elected officials, as well as be a collection tool to gather raw data that can be analyzed for trends and statistics, which in turn becomes new government data that we can use.
IMHO, this investment in open government data is a powerful catalyst that could actually create or change many jobs or business models. If done well, it could provide significant positive returns, streamline government, open access to more information, and enable new and interesting products and applications. </>
If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:
"Humans are capable of using the Web to carry out tasks such as finding the Finnish word for "monkey", reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web." (http://en.wikipedia.org/wiki/Semantic_Web).
To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.
We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don't see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow.
A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.
Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).
Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content's meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable "Tower of Babel" with many similar, but not fully interoperable data models.
The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?
This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee's vision of the semantic Web (see http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2009-01-07.html/ for a recent paper on the subject).
Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>