What is Smart Content?

At Gilbane we talk of “Smart Content,” “Structured Content,” and “Unstructured Content.” We will be discussing these ideas in a seminar entitled “Managing Smart Content” at the Gilbane Conference next week in Boston. Below I share some ideas about these types of content and what they enable and require in terms of processes and systems.

When you add meaning to content you make it “smart” enough for computers to do some interesting things. Organizing, searching, processing, and discovery are greatly improved, which also increases the value of the data. Structured content allows some, but fewer, processes to be automated or simplified, and unstructured content enables very little to be streamlined and requires the most ongoing human intervention.

Most content is not very smart. In fact, most content is unstructured and usually more difficult to process automatically. Think flat text files, HTML without all the end tags, etc. Unstructured content is more difficult for computers to interpret and understand than structured content due to incompleteness and ambiguity inherent in the content. Unstructured content usually requires humans to decipher the structure and the meaning, or even to apply formatting for display rendering.

The next level up toward smart content is structured content. This includes wellformed XML documents, content compliant to a schema, or even RDMS databases. Some of the intelligence is included in the content, such as boundaries of element (or field) being clearly demarcated, and element names that mean something to users and systems that consume the information. Automatic processing of structured content includes reorganizing, breaking into components, rendering for print or display, and other processes streamlined by the structured content data models in use.

Smart Content diagram

Finally, smart content is structured content that also includes the semantic meaning of the information. The semantics can be in a variety of forms such as RDFa attributes applied to structured elements, or even semantically names elements. However it is done, the meaning is available to both humans and computers to process.

Smart content enables highly reusable content components and powerful automated dynamic document assembly. Searching can be enhanced with the inclusion of metadata and buried semantics in the content providing more clues as to what the data is about, where it came from, and how it is related to other content.Smart content enables very robust, valuable content ecosystems.

Deciding which level of rigor is needed for a specific set of content requires understanding the business drivers intended to be met. The more structure and intelligence you add to content, the more complicated and expensive the system development and content creation and management processes may become. More intelligence requires more investment, but may be justified through benefits achieved.

I think it is useful if the XML and CMS communities use consistent terms when talking about the rigor of their data models and the benefits they hope to achieve with them. Hopefully, these three terms, smart content, structured content, and unstructured content ring true and can be used productively to differentiate content and application types.

New Gilbane Beacon Targets Digital Marketers

We’re pleased to announce the publication of a new Gilbane Beacon entitled Lessons for Digital Marketers: What Marketing Professionals Can Learn from the World’s Leading Publishers.

From the introduction:

". . .  Internet marketing will increase at the expense of traditional advertising, which is predicted to decline. This means that digital marketers will clearly be challenged to bring in the lion’s share of new customers and revenues. . . . 

"Gilbane believes that digital marketing managers can learn a great deal about leveraging content assets by drawing on the experiences of other content-rich organizations. One of the best candidate industries for lessons learned is the publishing industry. Challenges faced by CMOs and publishers are very similar: content closely tied to revenue streams, large volumes of diverse content types, rapidly evolving expectations regarding personalized content and interactivity, and requirement for frictionless publishing in order to meet the need for content immediacy. "

The paper is available for download now, along with a recording of the companion webinar. The paper will also be distributed at next week’s Gilbane Boston conference.

Nuts and Bolts Tutorials at The Gilbane Conference

In a world that seems increasingly about technology itself, it has become tempting to assume that the questions and challenges of new and better information products is about the technology.  While it is true that technology is the key enabler of the new information world we are building, it is also true that the decision making and judgment involved in how that technology is to be organized and deployed is of equal–and not decreasing–importance.  Indeed, as the products move toward increasing sophistication and flexibility–smart content you might say–the importance of the human and organizational parts of the information life cycle become even more important. 

It is a truism that you cannot deliver information products you can’t create and manage, and with the circle of participants in that creation and management ever widening, we must be sensitive to the limits of the creators.  Moreover, while just "getting it up on the web" used to be at least sufficient to justify deployment of information products, today’s information consumer has a much more extensive and demanding list of features required before he will accept web-based information.  The publisher who forgets  or ignores that list is for trouble.

In a half-day session preceding the Gilbane conference next week, the Gilbance consulting team will tackle some of the real world challenges inherent in this rapidly changing information world, providing both sign posts for issues likely to come up and "in the trenches" suggestions for how to deal with them.  The goal of the session, scheduled for the afternoon of December 1, is that the attendees leave with a better handle on how to proceed in the quest for better information products and the role "smart content" should play. 

The presenters, in addition to their expertise in the technology and tools of information, bring a unique resource to their efforts: years of design, implementation and evaluation of real organizations facing real challenges.

Upcoming Workshop: Managing Smart Content: How to Deploy XML Technologies across Your Organization

As part of next week’s Gilbane Boston Conference, the XML practice will be delivering a pre-conference workshop, "Managing Smart Content: How to Deploy XML Technologies across Your Organization." The instructors will be Geoff Bock, Dale Waldt, Bill Trippe, Barry Schaeffer and Neal Hannon–a group of experts that represents decades of technical and management experience on XML initiatives.

A tip of the virtual hat to Senior Analyst Geoff Bock for organizing this.

Read More →

What’s Happening at Gilbane Boston

We’ve been providing regular updates on Gilbane Boston over on our dedicated announcements and press release blog, as well as on Twitter, but since not everybody subscribes to either of those, here is a quick summary for both conference attendees and technology exhibit visitors, with links.

Open to all:

Conference options:

Follow the conference Twitter stream. The main hashtag is #gilbaneboston, but others will emerge from the attendees as #futurewcm has. You can join (dm @gilbaneboston) or follow the list of twitterers at Gilbane Boston.

There is also a list of Google “Wavers” at the conference to follow.

Hope to see you there.

Are Publishers to Become Printers Again?

Look into almost any publisher’s history, and if it has a good number of decades behind it, chances are very good that you’ll see that the publisher was its own printer. Houghton Mifflin Harcourt is but one example: it’s origin stems from the merger of publisher Ticknor & Fields and Riverside Press, an old Cambridge-based printer founded by Henry Oscar Houghton.

Today, of course, a lot of publishers typically use big printers such as Quebecor, RR Donnelley, or others. With the digital content streams getting under control among publishers of many stripes, together with the growing capability of production printing hardware and software, print on demand (POD) is already mainstream option. Witness Lightening Source.

A recently received press releaseannounced the planned acquisition of Océ, which provides high volume production printing platforms, by Canon, known for its consumer items like cameras and ink jet printers, but also for office equipment such as copiers and printers.

It turns out the Océ’s production printers are behind a good portion of the big POD services, and these machines are able to provide cost-effect alternatives to regular printing in many cases. As publishers seek to extract value out of backlists and custom books by digitizing the content and managing workflow, POD can enable them to produce runs too small for regular printing. But right-sized and right-cost POD can offer attractive margins when the digital content has been managed right.

It makes me wonder if publishers will take the POD in-house, given the relatively modest POD platform expenses, so that the publisher can capture a greater part of the margin on small press runs. Who knows? Maybe the separation of publishing and printing will turn out to have been a temporary anomaly.

With Kindle et al., it can be easy to get stuck on eBooks as the output, but with the right technologies addressed by the digital stream, what shouldn’t be overlooked is POD. PDQ, QED.

Once Upon a Time…

… there was SVG. People were excited about it. Adobe and others supported it. Pundits saw a whole new graphical web that would leverage SVG heavily. Heck, I even wrote a book about it. 

Then things got quiet for a long time…

However, there are some signs that SVG might be experiencing a bit of a renaissance, if the quality of presentations at a recent conference is a strong indication. It’s notable that Google hosted the conference and even more notable that Google is trying to bigfoot Microsoft into supporting SVG in IE, a move that would substantially boost SVG as an option for Web developers.

So a question for those out there interested in SVG. Where are some big projects out there? Are there organizations creating large bases of illustrations and other graphical content with SVG? I would love to talk to you and learn about your projects. You can email me or comment below.

UPDATE: Brad Neuberg of Google, who is quoted in the InfoWorld article linked above, sent along a link to a project at Google, SVG Web, a JavaScript library that supports SVG on many browsers, including Internet Explorer, Firefox, and Safari. According to the tool’s website, using the library plus native SVG support, you can instantly target ~95% of the existing installed web base.

UPDATE: Ruud Steltenpool, the organizer for SVG Open 2009, sent a link to an incredibly useful compendium of links to SVG projects, tools, and other resources though he warns it is a little outdated.

Using Technology to Improve the Quality of Source and Multilingual Content: An Interview with acrolinx

Sixth in a series of interviews with sponsors of Gilbane’s 2009 study on Multilingual Product Content: Transforming Traditional Practices into Global Content Value Chains.

We spoke with Kent Taylor, VP – Americas for acrolinx, a leader in quality assurance tools for professional information developers, The acrolinx information quality tools are used by thousands of writers in over 25 countries around the world. We talked with Kent about the growing importance of Natural Language Processing (NLP) technologies across the global content value chain (GCVC), as well as acrolinx’s interest in co-sponsoring the research and what he considers the most relevant findings.

Gilbane: How does your company support the value chain for global product support?

Taylor: Our information quality management software provides real-time feedback to authors and editors regarding the quality of their work, enabling quality assurance in terms of spelling, grammar, and conformance to their own style guide and terminology guidelines.  It also provides objective metrics and reports on over 90 aspects of content quality, therefore delivering quality control.  The value of formal information quality management across the information supply chain is reflected in reductions in translation cost and time of 10% to 30%, and reductions in editing time of 65% to 75%.

Gilbane: Why did you choose to sponsor the Gilbane research?

Taylor: To help build awareness of the contributions that Natural Language Processing technologies can bring to the global product content value chain.  Natural Language Processing is no longer just a laboratory curiosity; it is in daily use by many of the world’s most successful global enterprises.

Gilbane: What is the most interesting/compelling/relevant result reported in the study?

Taylor: The fact that "quality at the source" is now being recognized as a critical success factor in the global information supply chain.

For more insights into the link between authoring, quality assurance, and multilingual communications, see the section “Achieving Quality at the Source” that begins on page 28 of the report. You can also learn how acrolinx helped the Cisco Leaning Network with their quality assurance service, which now projects cost savings of 28% for Cisco certification, beginning on page 59 of the study.  Download the study for free.