Recently in XML Category

Over the past few weeks, since publishing Smart Content in the Enterprise, I’ve had several fascinating lunchtime conversations with colleagues concerned about content technologies. Our exchanges wind up with a familiar refrain that goes something like this. “Geoffrey, you have great insights about smart content but what am I supposed to do with all this information?” Ah, it’s the damning with faint praise gambit that often signals an analysis paralysis conundrum for decision-making.

Let me make one thing perfectly clear -- I do not have an out-of-the-box prescription for a solution. It’s not simply a matter of focusing on your customer experience, optimizing your content for search, investing in a component content management platform, or adopting DITA – although, depending on the situation, I may recommend some combination of these items as part of a smart content strategy.

For me, smart content remains a work in progress. I expect to develop the prescriptive road map in the months ahead. Here’s a quick take on where I am right now.

  • For publishers, it’s all about transforming the publishing paradigm through content enrichment – defining the appropriate level of granularity and then adding the semantic metadata for automated processing.
  • For application developers, it’s all about getting the information architecture right and ensuring that it’s extensible. There needs to be sensible storage, the right editing and management tools, multiple methods for organizing content, as well as a flexible rendering and production environment.
  • For business leaders and decision makers, there needs to be an upfront investment in the right set of content technologies that will increase profits, reduce operating costs, and mitigate risks. No, I am not talking about rocket science. But you do need a technology strategy and a business plan.

As highlighted by the case studies included in the report, I can point to multiple examples where organizations have done the right things to produce notable results. Dale and I will continue the smart content discussions at the Gilbane Boston conference right after Thanksgiving, both through our preconference workshop, and at a conference session “Smart Content in the Real World: Case Studies and Real Results.”

We are also launching a Smart Content Readiness Service, where we will engage with organizations on a consulting basis to identify:

  • The business drivers where smart content will ensure competitive advantage when distributing business information to customers and stakeholders
  • The technologies, tools, and skills required to componentized content, and target distribution to various audiences using multiple devices
  • The operational roles and governance needed to support smart content development and deployment across an organization
  • The implementation planning strategies and challenges to upgrade content and creation and delivery environments

Please contact me if you are interested in learning more.

In short, to answer my lunchtime colleagues, I cannot (yet) prescribe a fully baked solution. It’s too early for the recipes and the cookbook. But I do believe that the business opportunities and benefits are readily at hand. At this point, I would invite you to join the discussion by letting me know what you expect, what approaches you’ve tried, where you’ve wound up, what you think needs to come next – and how we might help you.

The Pull of Content Value

user-pic
Vote 4 Votes  

Traditionally, publishing is a pushy process. When I have something to say, I write it down. Perhaps I revise it, check with colleagues, and verify my facts with appropriate authorities. Then I publish it, and move on to the next thing – without directly interacting with my audience and stakeholders. Whether I distribute the content electronically or in a hard copy format, I leave it to my readers to determine the value of whatever I publish.

However, as we describe in our recently completed report Smart Content in the Enterprise, XML applications can transform this conventional publishing paradigm. By smart content, we mean content that is granular at the appropriate level, semantically rich, useful across applications, and meaningful for collaborative interaction.

From a business perspective, smart content adds value to published information in new and compelling ways. Let’s consider the experiences of NetApp and Warrior Gateway, two of the organizations featured in our report.

NetApp
As a provider of storage and data management solutions, NetApp has invested a lot of time and effort embracing DITA and restructuring its technical documentation. By systematically tagging and managing content components, and by focusing on the underlying content development processes, writers and editors can keep up with the pace of product releases.

But there is more to this publishing process orientation. Beyond simply producing product information faster and cheaper, NetApp is poised to make publishing better. The company can now easily support its reseller partners by providing them with the DITA tagged content that they can directly incorporate into their own OEM solutions. Resellers' customers get just the information they need, directly from the source. With its XML application, NetApp incorporates its partners and stakeholders into its information value chain.

Warrior Gateway
As a content aggregator, Warrior Gateway collects, organizes, enriches, and redistributes content about a wide range of health, welfare, and veteran-related services to soldiers, veterans, and their families. Rather than simply compiling an online catalog of service providers’ listings, Warrior Gateway restructures the content that government, military, and local organizations produce, and enriches it by adding veteran-related categories and other information. Furthermore, Warrior Gateway adds a social dimension by encouraging contributions from veterans and family members.

Once stored within the XML application powering Warrior Gateway, the content is easily reorganized and reclassified to provide the veterans’ perspective about areas of interest and importance. Volunteers working with Warrior Gateway can add new categories when necessary. Service providers can claim their profile and improve their own data details. Even the public users can contribute to content to the gateway, a crowd sourcing strategy to efficiently collect feedback from users. With contributions from multiple stakeholders, the published listings can be enriched over time without requiring a large internal staff to add the extra information.

Capturing New Business Value
There’s a lot more detail about how the XML applications work in our case studies – I recommend that you check them out.

What I find intriguing is the range of promising and potentially profitable business models engendered by smart content.  Enterprise publishers have new options and can go beyond simply pushing content through a publishing process. Now they can build on their investments, and capture the pull of content value.

Authoring in a structured text environment has traditionally been done with dedicated structured editors. These tools enable validation and user assisted markup features that help the user create complete and valid content. But these structured editors are somewhat complicated and unusual and require training in their use for the user to become proficient. The learning curve is not very steep but it does exist.

Many organizations have come to see documentation departments as a process bottleneck and try to engage others throughout the enterprise in the content creation and review processes. Engineers and developers can contribute to documentation and have a unique technical perspective. Installation and support personnel are on the front lines and have unique insight into how the product and related documentation is used. Telephone operators not only need the information at their fingertips, but can also augment it with comments and ides that occur while supporting users. Third-party partners and reviewers may also have a unique perspective and role to play in a distributed, collaborative content creation, management, review, and delivery ecosystem.

Our recently completed research on XML Smart Content in the Enterprise indicates that as we strive to move content creation and management out of the documentation department silo, we will also need to consider how the data is encoded and the usefulness of the data model in meeting our expanded business requirements. Smart content is multipurpose content designed with several uses in mind. Smart content is modular to support being assembled in a variety of forms. And smart content is structured content that has been enriched with semantic information to better identify it's topic and role to aide processing and searching. For these reasons, smart content also improves distributed collaboration. Let me elaborate.

One of the challenges for distributed collaboration is the infrequency of user participation and therefore, unfamiliarity with structured editing tools. It makes sense to simplify the editing process and tools for infrequent users. They can't always take a refresher course in the editor and it's features. They may be working remotely, even on a customer site installing equipment or software. These infrequent users need structured editing tools that are designed for them. These collaboration tools need to be intuitive and easy to figure out, easily accessible from just about anywhere, and should be affordable and have flexible licensing to allow a larger number of users to participate in the management of the content. This usually means one of two things: either the editor will be a plug in to another popular word processing system (e.g., MS Word), or it will be accessed though a thin-client browser, like a Wiki editor. In some environments, it is possible that both may be need in addition to traditional structured editing tools. Smart content modularity and enrichment allows flexibility in editing tools and process design. This allows the  use of a variety of editing tools and flexibility in process design, and therefore expanding who can collaborate from throughout the enterprise.

Also, infrequent contributors may not be able to master navigating and operating within a  complex repository and workflow environment either for the same familiarity reasons. Serving up information to a remote collaborator might be enhanced with keywords and other metadata that is designed to optimize searching and access to the content. Even a little metadata can provide a lot of simplicity to an infrequent user. Product codes, version information, and a couple of dates would allow a user to hone in on the likely content topics and select content to edit from a well targeted list of search results. Relationships between content modules that are indicated in metadata can alert a user that when one object is updated, other related objects may need to be reviewed for potential update as well.

It is becoming increasingly clear that there is no one model for XML or smart content creation and editing. Just as a carpenter may have several saws, each designed for a particular type of cut, a robust smart content structured content environment may have more than one editor in use. It behooves us to design our systems and tools to meet the desired business processes and user functionality, rather than limit our processes to the features of one tool.

One of the conclusions of our report Smart Content in the Enterprise (forthcoming next week) is how a little bit of enrichment goes a long way. It’s important to build on your XML infrastructure, enrich your content a little bit (to the extent that your business environment is able to support), and expect to iterate over time.

Consider what happened at Citrix, reported in our case study Optimizing the Customer Experience at Citrix: Restructuring Documentation and Training for Web Delivery. The company had adopted DITA for structured publishing several years ago. Yet just repurposing the content in product manuals for print and electronic distribution, and publishing the same information as HTML and PDF documents, did not change the customer experience.

A few years ago, Citrix information specialists had a key insight: customers expected to find support information by googling the web. To be sure, there was a lot of content about various Citrix products out in cyberspace, but very little of it came directly from Citrix. Consequently the most popular solutions available via web-wide searching were not always reliable, and the detailed information from Citrix (buried in their own manuals) was rarely found.

What did Citrix do? Despite limited resources, the documentation group began to add search metadata to the product manuals. With DITA, there was already a predefined structure for topics, used to define sections, chapters, and manuals. Authors and editors could simply include additional tagged metadata that identified and classified the contents – and thus expose the information to Google and other web-wide search engines.

Nor was there a lot of time or many resources for up-front design and detailed analysis. To paraphrase a perceptive information architect we interviewed, “Getting started was a lot like throwing the stuff against a wall to see what sticks.” At first tags simply summarized existing chapter and section headings. Significantly, this was a good enough place to start.

Specifically, once Citrix was able to join the online conversation with its customers, it was also able to begin tracking popular search terms. Then over time and with successive product releases, the documentation group was able to add additional tagged metadata and provide ever more focused (and granular) content components.

What does this mean for developing smart content and leveraging the benefits of XML tagging? Certainly the more precise your content enrichment, the more findable your information is going to be. When considering the business benefits of search engine optimization, the quality of your tagging can always improve over time. But as a simple value proposition, getting started is the critical first step.

If you're only reading this XML blog, be sure to check out my recent blog post Focusing on Smart Content, which I published in the main Gilbane blog.

We've published a new paper on addressing large-scale integration, storage, and access of complex information. As Dale mentions in his entry over on our main blog, the paper frames the discussion in terms of challenges to Open Government initiatives. We note, though, that the exploration of obstacles to effective, efficient processing of high volumes of data and content is relevant across many industries.

We're cross-posting here on the XML blog because the paper deals wtih XML content and the XML family of standards, including XQuery and XPath.

The Gilbane Beacon is available as a free download from Gilbane and from Mark Logic, sponsor of the paper.

As the world of search becomes more and more sophisticated (and that process has been underway for decades,) we may be approaching the limits of software's ability to improve its ability to find what a searcher wants. If that is true, and I suspect that it is, we will finally be forced to follow the trail of crumbs up the content life cycle... to its source. Indeed, most of the challenges inherent in today's search strategy and products appears to grow from the fact that while we continually increase our demands for intelligence on the back end, we have done little if anything to address the chaos that exists on the front end. You name it, different word processing formats, spreadsheets, HTML tagged text, database delimited files, and so on are all dumped into what we think of as a coherent, easily searchable body of intellectual property. It isn't and isn't likely to become so any time soon unless we address the source. Having spent some time in the library automation world, I can remember the sometimes bitter controversies over having just two major foundations for cataloging source material (Dewey and LC; add a third if you include the NICEM A/V scheme.) Had we known back then that the process of finding intellectual property would devolve into the chaos we now confront, with every search engine and database product essentialy rolling its own approach to rational search, we would have considered ourselves blessed. In the end, it seems, we must begin to see the source material, its physcial formats, its logical organization and its inclusion of rational cataloging and taxonomy elements as the conceptual raw material for its own location. As long as the word processing world teaches that anyone creating anything can make it look like it should in a dozen different ways, ignoring any semblance of finding-aid inclusion, we probably won't have a truly workable ability to find what we want without reworking the content or wading through a haystack of misses to find our desired hits. Unfortunately, the solutions of yesteryear, including after-creation cataloging by a professional cataloger, probably won't work now either, for cost if no other reason. We will be forced to approach the creators of valuable content, asking them for a minimum of preparation for searching their product, and providing the necessary software tools to make that possible. We can't act too soon because, despite the growth of software elegance and raw computer power, this situation will likely get worse as the sheer volume of valuable content grows. Regards, Barry Read more: Enterprise Search Practice Blog:  http://gilbane.com/search_blog/

As part of next week's Gilbane Boston Conference, the XML practice will be delivering a pre-conference workshop, "Managing Smart Content: How to Deploy XML Technologies across Your Organization." The instructors will be Geoff Bock, Dale Waldt, Bill Trippe, Barry Schaeffer and Neal Hannon--a group of experts that represents decades of technical and management experience on XML initiatives.

A tip of the virtual hat to Senior Analyst Geoff Bock for organizing this.

Over at TeleRead, David Rothman has a really fine writeup discussing our digital publishing report. He summarizes some of our key points about asset management and flexibility, but also raises some interesting related issues about DRM and the risks of "publishers as mixmasters."

My thanks to David for his thoughtful response.

I have a new post over at EMC's Community site, "Preserving Electronic Public Records: Lessons from the Washington State Digital Archives." This is part of our ongoing series for EMC on the use of ECM and XML in the public sector.

Bill's latest Tweet

NewsShark

Sign-up for our weekly NewsShark newsletter.
Content technology industry news without the hype:

* Email

* First Name

* Last Name

* = Required Field