Curated for content, computing, and digital experience professionals

Tag: structured content

Gilbane Advisor 1-30-19 — Structured content & robots, website first, AR & Blockchain almost

Conversations with robots: voice, smart agents & the case for structured content

The benefits of structured content have been clear for decades, but the cost and effort to create and manage it limited adoption to complex ‘mission-critical’ applications. Today, there are better tools, more expertise, and a much broader range of business and consumer applications that require structured content to be effective, competitive, cost efficient, and future-ready. Designer Andy Fitzgerald explains why structured content is more important than ever.

voice agents

Illustration by Dougal MacPherson

A useful read for project teams, with illustrated examples helpful for shared understanding. Read More

Why founders should start with a website, not a mobile app

And not just founders, though it is more critical for them. Most startups have few resources, and need to rapidly build a product, customer base, and supporting infrastructure before their money, or investor patience, runs out. Kapwing founder & CEO Julia Enthoven’s description of her startups’ decision cuts to the chase. Read More

Despite limitations, publishers plot more augmented reality for 2019

That bet is motivated by lots of work publishers did last year. The New York Times produced 13 different augmented reality projects in 2018, ranging from an investigation into a bombing in Syria to a visit to the large hadron collider at CERN; Time Magazine launched its first-ever augmented reality issue of its magazine; The Washington Post, which started producing augmented reality content in 2017, continued producing projects in 2018… But augmented reality still faces significant limitations. Read More

Blockchain’s Occam problem

Blockchain has yet to become the game-changer some expected. A key to finding the value is to apply the technology only when it is the simplest solution available. Read More


Gilbane digital experience conference

April 29
– May 1, 2019, Washington DC
Digital experience strategies, technologies, and practices, for marketing and the workplace.

Learn more & use code FG19 for best available price


Informatica Delivers Data Parser for Hadoop

Informatica Corporation, the provider of data integration software, announced the immediate availability of Informatica HParser, a data parsing transformation solution for Hadoop environments. Informatica HParser runs on distributions of Apache Hadoop, exploiting the parallelism of the MapReduce framework to efficiently turn unstructured complex data, such as web logs, social media data, call detail records and other data formats, into a structured or semi-structured format in Hadoop. Once transformed into a more structured format, the data can be used and validated to drive business insights and improve operations. Available in a free community edition and commercial editions, Informatica HParser provides organizations with the solution they require to extract the value of complex, unstructured data.

What is Smart Content?

At Gilbane we talk of “Smart Content,” “Structured Content,” and “Unstructured Content.” We will be discussing these ideas in a seminar entitled “Managing Smart Content” at the Gilbane Conference next week in Boston. Below I share some ideas about these types of content and what they enable and require in terms of processes and systems.

When you add meaning to content you make it “smart” enough for computers to do some interesting things. Organizing, searching, processing, and discovery are greatly improved, which also increases the value of the data. Structured content allows some, but fewer, processes to be automated or simplified, and unstructured content enables very little to be streamlined and requires the most ongoing human intervention.

Most content is not very smart. In fact, most content is unstructured and usually more difficult to process automatically. Think flat text files, HTML without all the end tags, etc. Unstructured content is more difficult for computers to interpret and understand than structured content due to incompleteness and ambiguity inherent in the content. Unstructured content usually requires humans to decipher the structure and the meaning, or even to apply formatting for display rendering.

The next level up toward smart content is structured content. This includes wellformed XML documents, content compliant to a schema, or even RDMS databases. Some of the intelligence is included in the content, such as boundaries of element (or field) being clearly demarcated, and element names that mean something to users and systems that consume the information. Automatic processing of structured content includes reorganizing, breaking into components, rendering for print or display, and other processes streamlined by the structured content data models in use.

Smart Content diagram

Finally, smart content is structured content that also includes the semantic meaning of the information. The semantics can be in a variety of forms such as RDFa attributes applied to structured elements, or even semantically names elements. However it is done, the meaning is available to both humans and computers to process.

Smart content enables highly reusable content components and powerful automated dynamic document assembly. Searching can be enhanced with the inclusion of metadata and buried semantics in the content providing more clues as to what the data is about, where it came from, and how it is related to other content. Smart content enables very robust, valuable content ecosystems.

Deciding which level of rigor is needed for a specific set of content requires understanding the business drivers intended to be met. The more structure and intelligence you add to content, the more complicated and expensive the system development and content creation and management processes may become. More intelligence requires more investment, but may be justified through benefits achieved.

I think it is useful if the XML and content management (CMS) communities use consistent terms when talking about the rigor of their data models and the benefits they hope to achieve with them. Hopefully, these three terms, smart content, structured content, and unstructured content ring true and can be used productively to differentiate content and application types.

Can Word Processors be used to Create Structured Content?

Today I will address a question I have grappled with for years, can non-structured authoring tools, e.g., word processors, can be used effectively to create structured content? I have been involved for some time in projects for various state legislatures and publishers trying to use familiar word processing tools to create XML content. So far, based on my experiences, I think the answer is a definite “maybe”. Let me explain and offer some rules for your consideration.

First understand that there is a range of validation and control possible in structured editing, from supporting a very loose data model to very strict data models. A loose data model might enforce a vocabulary of element type names but very little in the way of sequence and occurrence rules or data typing that would be required in a strict data model. Also remember that the rules expressed in your data model should be based on your business drivers such as regulatory compliance and internal policy. Therefore:

Rule number 1: The stricter your data model and business requirements are, the more you need a real structured editor. IMHO only very loose data models can effectively be supported in unstructured authoring tools.

Also, unstructured tools use a combination of formatting oriented structured elements and styles to emulate a structured editing experience. Styles tend to be very flat and have limited processing controls that can be applied to them. For instance, a heading style in an unstructured environment usually is applied only to the bold headline which is followed by a new style for the paragraphs that follow. In a structured environment, the heading and paragraphs would have a container element, perhaps chapter, that clearly indicates the boundaries of the chapter. Therefore structured data is less ambiguous than unstructured data. Ambiguity is easier for humans to deal with than computers which like everything explicitly marked up. It is important to know who is going to consume, process, manage, or manipulate the data. If these processes are mostly manual ones, then unstructured tools may be suitable. If you hope to automate a lot of the processing, such as page formatting, transforms to HTML and other formats, or reorganizing the data, then you will quickly find the limitations of unstructured tools. Therefore:

Rule Number 2: Highly automated and streamline processes usually required content to be created in a true structured editor. And very flexible content that is consumed or processed mostly by humans may support the use of unstructured tools.

Finally, the audience for the tools may influence how structured the content creation tools can be. If your user audience includes professional experts, such as legislative attorneys, you may not be able to convince them to use a tool that behaves differently than the word processor they are used to. They need to focus on the intellectual act or writing and how that law might affect other laws. They don’t want to have to think about the editing tool and markup it uses the way some production editors might. It is also good to remember that working under tight deadlines also impacts how much structure can be “managed” by the authors. Therefore:

Rule Number 3: Structured tools may be unsuitable for some users due to the type of writing they perform or the pressures of the environment in which they work.

By the way, a structured editing tool may be an XML structured editor, but it could also be a Web form, application dialog, Wiki, or some other interface that can enforce the rules expressed in the data model. But this is a topic for another day. </>

© 2020 The Gilbane Advisor

Theme by Anders NorenUp ↑