Recently in Information Architecture Category

Over the past few weeks, since publishing Smart Content in the Enterprise, I’ve had several fascinating lunchtime conversations with colleagues concerned about content technologies. Our exchanges wind up with a familiar refrain that goes something like this. “Geoffrey, you have great insights about smart content but what am I supposed to do with all this information?” Ah, it’s the damning with faint praise gambit that often signals an analysis paralysis conundrum for decision-making.

Let me make one thing perfectly clear -- I do not have an out-of-the-box prescription for a solution. It’s not simply a matter of focusing on your customer experience, optimizing your content for search, investing in a component content management platform, or adopting DITA – although, depending on the situation, I may recommend some combination of these items as part of a smart content strategy.

For me, smart content remains a work in progress. I expect to develop the prescriptive road map in the months ahead. Here’s a quick take on where I am right now.

  • For publishers, it’s all about transforming the publishing paradigm through content enrichment – defining the appropriate level of granularity and then adding the semantic metadata for automated processing.
  • For application developers, it’s all about getting the information architecture right and ensuring that it’s extensible. There needs to be sensible storage, the right editing and management tools, multiple methods for organizing content, as well as a flexible rendering and production environment.
  • For business leaders and decision makers, there needs to be an upfront investment in the right set of content technologies that will increase profits, reduce operating costs, and mitigate risks. No, I am not talking about rocket science. But you do need a technology strategy and a business plan.

As highlighted by the case studies included in the report, I can point to multiple examples where organizations have done the right things to produce notable results. Dale and I will continue the smart content discussions at the Gilbane Boston conference right after Thanksgiving, both through our preconference workshop, and at a conference session “Smart Content in the Real World: Case Studies and Real Results.”

We are also launching a Smart Content Readiness Service, where we will engage with organizations on a consulting basis to identify:

  • The business drivers where smart content will ensure competitive advantage when distributing business information to customers and stakeholders
  • The technologies, tools, and skills required to componentized content, and target distribution to various audiences using multiple devices
  • The operational roles and governance needed to support smart content development and deployment across an organization
  • The implementation planning strategies and challenges to upgrade content and creation and delivery environments

Please contact me if you are interested in learning more.

In short, to answer my lunchtime colleagues, I cannot (yet) prescribe a fully baked solution. It’s too early for the recipes and the cookbook. But I do believe that the business opportunities and benefits are readily at hand. At this point, I would invite you to join the discussion by letting me know what you expect, what approaches you’ve tried, where you’ve wound up, what you think needs to come next – and how we might help you.

The growth in web-centric communication has created a major focus on content management, web content management , component content management, and so on. This interest is driven primarily by increasing demand for rich, interactive, accessible information products delivered via the Web. The focus is not misplaced but may be missing part of the point. To be specific, in our focus on the "management" part of CM, we may be missing the first word in the phrase.... "Content."

It's true that the application of increasing amounts of computer and brain power to the processes associated with preparing and delivering the kind of information demanded by today's users can improve those products. But it does so within limits set by and at costs generated by the content "raw material" it gets from the content providers. In many cases, the content available to web product development processes is so structurally crude that it requries major clean-up and enhancement in order to adequately participate in the classification and delivery process. As the focus on elegant Web delivery increases, barring real changes in the condition of this raw content, the cost of enhancement is likely to grow proportionally, straining the involved organizations' ability to support it.

The answer may be in an increased focus on the processes and tools used to create the original content. We know that the original creator of most content knows the most about how it should be logically structured and most about the best way to classify it for search and retrieval. Trouble is, in most cases, we provide no means of capturing what the creator knows about his or her intellectual product. Moreover, because many creators have never been able to fully populate the metadata needed to classify and deliver their content, in past eras, professional catalogers were employed to complete this final step. In today's world, however, we have virtually eliminated the cataloger, assuming instead that the prodigious computer power available to us could develop the needed classification and structure from the content itself. That approach can and does work, but it will require better raw material if it is to achieve the level of effectiveness needed to keep the Web from becoming a virtual haystack in which finding the needle is more good luck than good measure. Native XML editors instead of today's visually oriented word processors, spreadsheets, graphics and other media forms with content-specific XML under them, increased use of native XML databases and a host of rich content-centric resources are part of this content evolution.

Most important, however, may be promulgation of the realization across society that creating content includes more than just making it look good on the screen, and that the creator shares in that responsibility. This won't be an easy or quick process, requiring more likely generations than years, but if we don't begin soon, we may end up with a Web 3 or 4 or 5.0 trying to deliver content that isn't even yet 1.0.

What is Smart Content?

user-pic
Vote 2 Votes  

At Gilbane we talk of "Smart Content," "Structured Content," and "Unstructured Content." We will be discussing these ideas in a seminar entitled "Managing Smart Content" at the Gilbane Conference next week in Boston. Below I share some ideas about these types of content and what they enable and require in terms of processes and systems.

When you add meaning to content you make it "smart" enough for computers to do some interesting things. Organizing, searching, processing, and discovery are greatly improved, which also increases the value of the data. Structured content allows some, but fewer, processes to be automated or simplified, and unstructured content enables very little to be streamlined and requires the most ongoing human intervention.

Most content is not very smart. In fact, most content is unstructured and usually more difficult to process automatically. Think flat text files, HTML without all the end tags, etc. Unstructured content is more difficult for computers to interpret and understand than structured content due to incompleteness and ambiguity inherent in the content. Unstructured content usually requires humans to decipher the structure and the meaning, or even to apply formatting for display rendering.

The next level up toward smart content is structured content. This includes wellformed XML documents, content compliant to a schema, or even RDMS databases. Some of the intelligence is included in the content, such as boundaries of element (or field) being clearly demarcated, and element names that mean something to users and systems that consume the information. Automatic processing of structured content includes reorganizing, breaking into components, rendering for print or display, and other processes streamlined by the structured content data models in use.

Finally, smart content is structured content that also includes the semantic meaning of the information. The semantics can be in a variety of forms such as RDFa attributes applied to structured elements, or even semantically names elements. However it is done, the meaning is available to both humans and computers to process.

SmartContentValue.jpgSmart content enables highly reusable content components and powerful automated dynamic document assembly. Searching can be enhanced with the inclusion of metadata and buried semantics in the content providing more clues as to what the data is about, where it came from, and how it is related to other content.Smart content enables very robust, valuable content ecosystems.

Deciding which level of rigor is needed for a specific set of content requires understanding the business drivers intended to be met. The more structure and intelligence you add to content, the more complicated and expensive the system development and content creation and management processes may become. More intelligence requires more investment, but may be justified through benefits achieved.

I think it is useful if the XML and CMS communities use consistent terms when talking about the rigor of their data models and the benefits they hope to achieve with them. Hopefully, these three terms, smart content, structured content, and unstructured content ring true and can be used productively to differentiate content and application types.

Yesterday the big stimulus bill cleared the conference committee that resolves the Senate and House versions. If you remember your civics that means it will be likely to pass in the chambers and then be signed into law by the president.

Included in the bill are billions of dollars for digitizing important information such as medical records or government information. Wow! That is a lot of investment! The thinking is that inaccessible information locked in paper or proprietary formats cost us billions each year in productivity. Wow! That's a lot of waste! Also, that access to the information could spawn a billions of dollars of new products and services, and therefore income and tax revenue. Wow! That's a lot of growth!

Many agencies and offices have striven to expose useful official information and reports at the federal and state level. Even so, there is a lot of data still locked away, or incomplete or in difficult to use forms. A while ago a Senate official once told me that they do not maintain a single, complete, accurate, official copy of the US Statutes internally. Even if this is no longer true, the public often relies on the "trusted" versions that are available only through paid online services. Many other data types, like many medical records, only exist in paper.

There are a lot of challenges, such as security and privacy issues, even intellectual property rights issues. But there are a lot of opportunities too. There are thousands of data sources that could be tapped into that are currently locked in paper or proprietary formats.

I don't think the benefits will come at the expense of commercial services already selling this publicly owned information as some may fear. These online sites provide a service, often emphasizing timeliness or value adds like integrating useful data from different sources, in exchange for their fees. I think a combination of free government open data resources and delivery tools, plus innovative commercial products will emerge. Maybe some easily obtained data may become commoditized, but new ways of accessing and integrating information will emerge. The big information services probably have more to fear from startups than from free government applications and data.

As it happens, I saw a demo yesterday of a tool that took all the activity of a state legislature and unified it under one portal. This allows people to track a bill and all related activity in a single place. For free! The bill working its way through both chambers is connected to related hearing agendas and minutes, which are connected to schedules, with status and other information captured in a concise dashboard-like screen format (there are other services you can pay for which fund the site). Each information component came from a different office and was originally in it's own specialized format. What we were really looking at was a custom data integration application done with AJAX technology integrating heterogeneous data in a unified view. Very powerful, and yet scalable. The key to its success was strong integration of data, the connections that were used to tie the information together. The vendor collected and filtered the data, converted to a common format, added the linkage and relationship information to provide an integrated view into data. All source data is stored separately and maintained by different offices. Five years ago it would have been a lot more difficult to create the service. Technology has advanced, and the data are increasingly available in manageable forms.

The government produces a lot of information that affect us daily that we, as taxpayers and citizens, actually own, but have limited or no access to. These include statutes and regulations, court cases, census data, scientific data and research, agricultural reports, SEC filings, FDA drug information, taxpayer publications, forms, patent information, health guidelines, etc., etc., etc. The list is really long. I am not even scratching the surface! It also includes more interactive and real-time data, such as geological and water data, whether information, and the status of regulation and legislation changes (like reporting on the progress of the stimulus bill as it worked it way through both chambers). All of these can be made more current, expanded for more coverage, integrated with related materials, validated for accuracy. There are also new opportunities to open up the process of using forums and social media tools for collecting feedback from constituents and experts (like the demo mentioned above). Social media tools may both give people an avenue to express their ideas to their elected officials, as well as be a collection tool to gather raw data that can be analyzed for trends and statistics, which in turn becomes new government data that we can use.

IMHO, this investment in open government data is a powerful catalyst that could actually create or change many jobs or business models. If done well, it could provide significant positive returns, streamline government, open access to more information, and enable new and interesting products and applications. </>

Will XML Help this President?

user-pic
Vote 2 Votes  

I’m watching the inauguration activity today all day (not getting much work done) and getting caught up in the optimism and history of it all. And what does this have to do with XML you ask? It’s a stretch, but I am giddy from the festivities, so bare with me please. I think there is a big role for XML and structured technologies in this paradigm shift, albeit XML will be quietly doing it’s thing in the background as always.

In 1986, when SGML, XML's precursor, was being developed, I worked for the IRS in Washington. I was green, right out of college. My Boss, Bill Davis, said I should look into this SGML stuff. I did. I was hooked. It made sense. We could streamline the text applications we were developing. I helped write the first DTD in the executive branch (the first real government one was the ATOS DTD from the US Air Force, but that was developed slightly before the SGML standard was confirmed, so we always felt we were pretty close to creating the actual first official DTD in the federal government). Back then we were sending tax publications and instructions to services like CompuServe and BRS, each with their own data formats. We decided to try to adopt structured text technology and single source publishing to make data available in SGML to multiple distribution channels. And this was before the Web.  That specific system has surely been replaced, but it saved time and enabled us to improve our service to taxpayers. We thought the approach was right for many govenrment applications  and should be repeated by other agencies.

So, back to my original point. XML has replaced SGML and is now being used for many government systems including electronic submission of SEC filings, FDA applications, and for the management of many government records. XML has been mentioned as a key technology in the overhaul that is needed in the way the government operates. Obama also plans to create a cabinet level position of CTO, part of the mission of which will be to promote inter-agency cooperation through interchange of content and data between applications formatted in a common taxonomy. He also intends to preserve the open nature of the internet and its content, facilitate publishing important government information and activities on the Web in open formats, and to enhance the national information system infrastructure. Important records are being considered for standardization, such as health and medical records, as well as many other ways we interact with the government. More info on this administration’s technology plan can be found at http://origin.barackobama.com/issues/technology/. Sounds like a job, at least in part, for XML!

 I think it is great and essential that our leaders understand the importance of smartly structured data. There is already a lot of XML expertise through the various government offices, as well as a strong spirit of corporation on which we can build. Anyone who has participated in industry schema application development, or other common vocabulary design efforts, knows how hard it is to create a “one-size-fits-all” data model. I was fortunate enough to participate briefly in the development and implementation of SPL, the Standard Product Label (see http://www.fda.gov/oc/datacouncil/spl.html) schema for FDA drug labels which are submitted to the FDA for approval before the drug product can be sold. This is a very well defined document type that has been in use for years. It still took many months and masterful consensus building to finalize this one schema. And it is just one small piece in the much larger information architecture.  It was a lot of effort from many people within and outside the government.  But now it is in place, working and being used.

So, I am bullish on XML in the government these days. It is a mature, well understood, powerful technology with wide adoption, there are many established civilian and defense  examples across the government. I think there is a very big role for XML and related technology in the aggressive, sweeping change promised by this administration. Even so, these things take time. </>

Bill's latest Tweet

NewsShark

Sign-up for our weekly NewsShark newsletter.
Content technology industry news without the hype:

* Email

* First Name

* Last Name

* = Required Field