Author: dwaldt (page 1 of 3)

Understanding the Smart Content Technology Landscape

If you have been following recent XML Technologies blog entries, you will notice we have been talking a lot lately about XML Smart Content, what it is and the benefits it can bring to an organization. These include flexible, dynamic assembly for delivery to different audiences, search optimization to improve customer experience, and improvements for distributed collaboration. Great targets to aim for, but you may ask are we ready to pursue these opportunities? It might help to better understand the technology landscape involved in creating and delivering smart content.

The figure below illustrates the technology landscape for smart content. At the center are fundamental XML technologies for creating modular content, managing it as discrete chunks (with or without a formal content management system), and publishing it in an organized fashion. These are the basic technologies for “one source, one output” applications, sometimes referred to as Single Source Publishing (SSP) systems.

Smart Content landscape

Smart Content Landscape

The innermost ring contains capabilities that are needed even when using a dedicated word processor or layout tool, including editing, rendering, and some limited content storage capabilities. In the middle ring are the technologies that enable single-sourcing content components for reuse in multiple outputs. They include a more robust content management environment, often with workflow management tools, as well as multi-channel formatting and delivery capabilities and structured editing tools. The outermost ring includes the technologies for smart content applications, which are described below in more detail.

It is good to note that smart content solutions rely on structured editing, component management, and multi-channel delivery as foundational capabilities, augmented with content enrichment, topic component assembly, and social publishing capabilities across a distributed network. Descriptions of the additional capabilities needed for smart content applications follow.

Content Enrichment / Metadata Management: Once a descriptive metadata taxonomy is created or adopted, its use for content enrichment will depend on tools for analyzing and/or applying the metadata. These can be manual dialogs, automated scripts and crawlers, or a combination of approaches. Automated scripts can be created to interrogate the content to determine what it is about and to extract key information for use as metadata. Automated tools are efficient and scalable, but generally do not apply metadata with the same accuracy as manual processes. Manual processes, while ensuring better enrichment, are labor intensive and not scalable for large volumes of content. A combination of manual and automated processes and tools is the most likely approach in a smart content environment. Taxonomies may be extensible over time and can require administrative tools for editorial control and term management.

Component Discovery / Assembly: Once data has been enriched, tools for searching and selecting content based on the enrichment criteria will enable more precise discovery and access. Search mechanisms can use metadata to improve search results compared to full text searching. Information architects and organizers of content can use smart searching to discover what content exists, and what still needs to be developed to proactively manage and curate the content. These same discovery and searching capabilities can be used to automatically create delivery maps and dynamically assemble content organized using them.

Distributed Collaboration / Social Publishing: Componentized information lends itself to a more granular update and maintenance process, enabling several users to simultaneously access topics that may appear in a single deliverable form to reduce schedules. Subject matter experts, both remote and local, may be included in review and content creation processes at key steps. Users of the information may want to “self-organize” the content of greatest interest to them, and even augment or comment upon specific topics. A distributed social publishing capability will enable a broader range of contributors to participate in the creation, review and updating of content in new ways.

Federated Content Management / Access: Smart content solutions can integrate content without duplicating it in multiple places, rather accessing it across the network in the original storage repository. This federated content approach requires the repositories to have integration capabilities to access content stored in other systems, platforms, and environments. A federated system architecture will rely on interoperability standards (such as CMIS), system agnostic expressions of data models (such as XML Schemas), and a robust network infrastructure (such as the Internet).

These capabilities address a broader range of business activity and, therefore, fulfill more business requirements than single-source content solutions. Assessing your ability to implement these capabilities is essential in evaluating your organizations readiness for a smart content solution.

What’s Hot in XML? Workshop on Smart Content Describes Leading-Edge Content Applications

What is hot in XML these days? I have been to a few conferences and meetings, talked with many clients, participated in various research projects, and developed case studies on emerging approaches to XML adoption. DITA (Darwin Information Typing Architecture) is hot. Semantically enriched XML is hot. Both enable some interesting functionality for content delivered via print, on the web, and through mobile delivery channels. These include dynamic assembly of content organized into a variety of forms for custom uses, improved search and discovery of content, content interoperability across platforms, and distributed collaboration in creating and managing content.

On November 30, prior to the Gilbane Conference in Boston, Geoff Bock and I will be holding our 3rd workshop on Smart Content which is how we refer to semantically enriched, modular content (it’s easier to say). In the seminar we will discuss what makes content smart, how it is being developed and deployed in several organizations, and dive into some technical details on DITA and semantic enrichment.  This highly interactive seminar has been well received in prior sessions, and will be updated with our recently completed research findings.  More information on the seminar is available at  http://gilbaneboston.com/10/workshops.html.

By the way, t The research report, entitled Smart Content in the Enterprise, is now available at the research section at Gilbane.com. It (now available from Outsell Inc) includes several interesting case studies from a variety of organizations, and a lot of good information for those considering taking their content to the next level. We encourage you to download it (it is free). I also hope to see you in Boston at the workshop.

How Smart Content Aids Distributed Collaboration

Authoring in a structured text environment has traditionally been done with dedicated structured editors. These tools enable validation and user assisted markup features that help the user create complete and valid content. But these structured editors are somewhat complicated and unusual and require training in their use for the user to become proficient. The learning curve is not very steep but it does exist.

Many organizations have come to see documentation departments as a process bottleneck and try to engage others throughout the enterprise in the content creation and review processes. Engineers and developers can contribute to documentation and have a unique technical perspective. Installation and support personnel are on the front lines and have unique insight into how the product and related documentation is used. Telephone operators not only need the information at their fingertips, but can also augment it with comments and ides that occur while supporting users. Third-party partners and reviewers may also have a unique perspective and role to play in a distributed, collaborative content creation, management, review, and delivery ecosystem.

Our recently completed research on XML Smart Content in the Enterprise indicates that as we strive to move content creation and management out of the documentation department silo, we will also need to consider how the data is encoded and the usefulness of the data model in meeting our expanded business requirements. Smart content is multipurpose content designed with several uses in mind. Smart content is modular to support being assembled in a variety of forms. And smart content is structured content that has been enriched with semantic information to better identify it’s topic and role to aide processing and searching. For these reasons, smart content also improves distributed collaboration. Let me elaborate.

One of the challenges for distributed collaboration is the infrequency of user participation and therefore, unfamiliarity with structured editing tools. It makes sense to simplify the editing process and tools for infrequent users. They can’t always take a refresher course in the editor and it’s features. They may be working remotely, even on a customer site installing equipment or software. These infrequent users need structured editing tools that are designed for them. These collaboration tools need to be intuitive and easy to figure out, easily accessible from just about anywhere, and should be affordable and have flexible licensing to allow a larger number of users to participate in the management of the content. This usually means one of two things: either the editor will be a plug in to another popular word processing system (e.g., MS Word), or it will be accessed though a thin-client browser, like a Wiki editor. In some environments, it is possible that both may be need in addition to traditional structured editing tools. Smart content modularity and enrichment allows flexibility in editing tools and process design. This allows the  use of a variety of editing tools and flexibility in process design, and therefore expanding who can collaborate from throughout the enterprise.

Also, infrequent contributors may not be able to master navigating and operating within a  complex repository and workflow environment either for the same familiarity reasons. Serving up information to a remote collaborator might be enhanced with keywords and other metadata that is designed to optimize searching and access to the content. Even a little metadata can provide a lot of simplicity to an infrequent user. Product codes, version information, and a couple of dates would allow a user to hone in on the likely content topics and select content to edit from a well targeted list of search results. Relationships between content modules that are indicated in metadata can alert a user that when one object is updated, other related objects may need to be reviewed for potential update as well.

It is becoming increasingly clear that there is no one model for XML or smart content creation and editing. Just as a carpenter may have several saws, each designed for a particular type of cut, a robust smart content structured content environment may have more than one editor in use. It behooves us to design our systems and tools to meet the desired business processes and user functionality, rather than limit our processes to the features of one tool.

Repurposing Content vs. Creating Multipurpose Content

In our recently completed research on Smart Content in the Enterprise we explored how organizations are taking advantage of benefits from XML throughout the enterprise and not just in the documentation department. Our findings include several key issues that leading edge XML implementers are addressing including new delivery requirements, new ways of creating and managing content, and the use of standards to create rich, interoperable content. In our case studies we examined how some are breaking out of the documentation department silo and enabling others inside or even outside the organization to contribute and collaborate on content. Some are even using crowd sourcing and social publishing to allow consumers of the information to annotate it and participate in its development. We found that expectations for content creation and management have changed significantly and we need to think about how we organize and manage our data to support these new requirements. One key finding of the research is that organizations are taking a different approach to repurposing their content, a more proactive approach that might better be called “multipurposing”.

In the XML world we have been talking about repurposing content for decades. Repurposing content usually means content that is created for one type of use is reorganized, converted, transformed, etc. for another use. Many organizations have successfully deployed XML systems that optimize delivery in multiple formats using what is often referred to as a Single Source Publishing (SSP) process where a single source of content is created and transformed into all desired deliverable formats (e.g., HTML, PDF, etc.).

Traditional delivery of content in the form of documents, whether in HTML or PDF, can be very limiting to users who want to search across multiple documents, reorganize document content into a form that is useful to the particular task at hand, or share portions with collaborators. As the functionality on Web sites and mobile devices becomes more sophisticated, new ways of delivering content are needed to take advantage of these capabilities. Dynamic assembly of content into custom views can be optimized with delivery of content components instead of whole documents. Powerful search features can be enhanced with metadata and other forms of content enrichment.

SSP and repurposing content traditionally focuses on the content creation, authoring, management and workflow steps up to delivery. In order for organizations to keep up with the potential of delivery systems and the emerging expectations of users, it behooves us to take a broader view of requirements for content systems and the underlying data model. Developers need to expand the scope of activities they evaluate and plan for when designing the system and the underlying data model. They should consider what metadata might improve faceted searching or dynamic assembly. In doing so they can identify the multiple purposes the content is destined for throughout the ecosystem in which it is created, managed and consumed.

Multipurpose content is designed with additional functionality in mind including faceted search, distributed collaboration and annotation, localization and translation, indexing, and even provisioning and other supply chain transactions. In short, multipurposing content focuses on the bigger picture to meet a broader set of business drivers throughout the enterprise, and even beyond to the needs of the information consumers.

It is easy to get carried away with data modeling and an overly complex data model usually requires more development, maintenance, and training than would otherwise be required to meet a set of business needs. You definitely want to avoid using specific processing terminology when naming elements (e.g., specific formatting, element names that describe processing actions instead of defining the role of the content). You can still create data models that address the broader range of activities without using specific commands or actions. Knowing a chunk of text is a “definition” instead of an “error message” is useful and far more easy to reinterpret for other uses than an “h2” element name or an attribute for display=’yes’. Breaking chapters into individual topics eases custom, dynamic assembly. Adding keywords and other enrichment can improve search results and the active management of the content. In short, multipurpose data models can and should be comprehensive and remain device agnostic to meet enterprise requirements for the content.

The difference between repurposing content and multipurpose content is a matter of degree and scope, and requires generic, agnostic components and element names. But most of all, multipurposing requires understanding the requirements of all processes in the desired enterprise environment up front when designing a system to make sure the model is sufficient to deliver designed outcomes and capabilities. Otherwise repurposing content will continue to be done as an afterthought process and possibly limit the usefulness of the content for some applications.

Paper on Open Government Data Initiatives Available

Updated March 3, 2010

Government agencies produce a lot of information. Making it accessible to the public, which essentially paid for it, can be quite challenging. The volume is high. The formats are varied. Much of it remains locked in information silos.

Support is growing to take steps to make as much government information available to the public as possible. President Obama issued a directive describing the official policy for Transparency and Open Government that mandates an unprecedented level of accessibility to government information. At the same time, technical advances have improved the feasibility of increasing access to the data.

I recently completed a Gilbane paper on this topic and how some agencies are improving access to public data. It is now available for free on our Web site at https://gilbane.com/beacons.html. The paper’s sponsor, Mark Logic, has provided interesting case studies that illustrate the challenges and approaches to overcoming them. I also explore some of the major hurdles that need to be crossed to achieve this goal, including:

  1. Extremely high volumes of content and data
  2. Highly diverse, heterogeneous data formats and data models
  3. Complex content integration and delivery requirements
  4. Time-sensitivity of content
  5. Changing information environments

The approaches described have enabled that users of this technology to implement high-volume, disparate-data applications that not only overcome old technical barriers but also deliver new value to their organizations. This is, after all, the essence of open data – be it for open government, open publishing, or open enterprise.

I encourage you to read this paper to get a better understanding of what works to make government data more open.

Update: the Beacon is also available from Mark Logic.

What is Smart Content?

At Gilbane we talk of “Smart Content,” “Structured Content,” and “Unstructured Content.” We will be discussing these ideas in a seminar entitled “Managing Smart Content” at the Gilbane Conference next week in Boston. Below I share some ideas about these types of content and what they enable and require in terms of processes and systems.

When you add meaning to content you make it “smart” enough for computers to do some interesting things. Organizing, searching, processing, and discovery are greatly improved, which also increases the value of the data. Structured content allows some, but fewer, processes to be automated or simplified, and unstructured content enables very little to be streamlined and requires the most ongoing human intervention.

Most content is not very smart. In fact, most content is unstructured and usually more difficult to process automatically. Think flat text files, HTML without all the end tags, etc. Unstructured content is more difficult for computers to interpret and understand than structured content due to incompleteness and ambiguity inherent in the content. Unstructured content usually requires humans to decipher the structure and the meaning, or even to apply formatting for display rendering.

The next level up toward smart content is structured content. This includes wellformed XML documents, content compliant to a schema, or even RDMS databases. Some of the intelligence is included in the content, such as boundaries of element (or field) being clearly demarcated, and element names that mean something to users and systems that consume the information. Automatic processing of structured content includes reorganizing, breaking into components, rendering for print or display, and other processes streamlined by the structured content data models in use.

Smart Content diagram

Finally, smart content is structured content that also includes the semantic meaning of the information. The semantics can be in a variety of forms such as RDFa attributes applied to structured elements, or even semantically names elements. However it is done, the meaning is available to both humans and computers to process.

Smart content enables highly reusable content components and powerful automated dynamic document assembly. Searching can be enhanced with the inclusion of metadata and buried semantics in the content providing more clues as to what the data is about, where it came from, and how it is related to other content.Smart content enables very robust, valuable content ecosystems.

Deciding which level of rigor is needed for a specific set of content requires understanding the business drivers intended to be met. The more structure and intelligence you add to content, the more complicated and expensive the system development and content creation and management processes may become. More intelligence requires more investment, but may be justified through benefits achieved.

I think it is useful if the XML and CMS communities use consistent terms when talking about the rigor of their data models and the benefits they hope to achieve with them. Hopefully, these three terms, smart content, structured content, and unstructured content ring true and can be used productively to differentiate content and application types.

When is a Book Not a Book?

I recently wrote a short Gilbane Spotlight article for the EMC XML community site about the state of Iowa going paperless (article can be found here) in regards to its Administrative Code publication. It got me to thinking, “When is a book no longer a book?”

Originally the admin code was produced as a 10,000 page loose-leaf publication service containing all the regulations of the state. For the last 10 years it has also appeared on the Web as PDFs of pages, and more recently, independent data chunks in HTML. And now they have discontinued the commercial printing of the loose-leaf version and only rely on the electronic versions to inform the public. They still produce PDF pages that resemble the printed volumes that are intended for local printing of select sections by public users of the information. But the electronic HTML version is being enhanced to improve reusability of the content, present it in alternative forms and integrated with related materials, etc. Think mashups and improved search capabilities. The content is managed in an XML-based Single Source Publishing system that produces all output forms.

I have migrated many, many printed publications to XML SSP platforms. Most follow the same evolutionary path regarding how the information is delivered to consumers. First they are printed. Then a second electronic copy is produced simultaneously with the print using separate production processes. Then the data is organized in a single database and reformatted to allow editing that can produce both print and electronic. Eventually the data gets enhanced and possibly broken into chunks to better enable reusing the content, but the print is still a viable output format. Later, the print is discontinued as the subscription list falls and the print product is no longer feasible. Or the electronic version is so much better, that people stop buying the print version.
So back to the original question, is it no longer a book? Is it when you stop printing pages? Or when you stop producing the content in page-oriented PDFs? Or does it have to do with how you manage and store the information?

Other changes take place in how the information is edited, formatted, and stored that might influence the answer to the question. For instance, if the content is still managed as a series of flat files, like chapters, and assembled for print, it seems to me that it is still a book, especially if it still contains content that is very book oriented, like tables of contents and other front matter, indexes, and even page numbers. Eventually, the content may be reorganized as logical chunks stored in a database, extracted for one or more output formats and organized appropriately for each delivery version, as in SSP systems. Print artifacts like TOCs may be completely generated and not stored as persistent objects, or they can be created and managed as build lists or maps (like with DITA). As long as one version is still book-like, IMHO it is still a book.

I would posit that once the printed versions are discontinued, and all electronic versions no longer contain print-specific artifacts, then maybe this is no longer a book, but simply content.

Of Twits

I came across an interesting scene the other day on Larry King. Ashton Kutcher was basking in his success to be the first person to have 1,000,000 followers on Twitter, beating CNN by just minutes. My first thought was “Why Ashton Kutcher?” My second was “Why not?” As an aside, should we now call Ashton King Twit?

Anyway, it got me thinking about Twitter and how I communicate electronically. I have been a rabid user of text messaging for several years. It has become the primary mode of communication with my college age sons (except when we are in the room together), who have all but abandoned email, even IM. Phone based text messaging even allows my wife and I to constantly keep in touch while I travel without requiring both of us to be talking synchronously (another way of saying being tied up at the same time). Asynchronous communication in the form of emails, text messages, tweets, IM, etc. have freed people up from maintaining a real-time state with their conversation partners. Maybe asynchronous messaging has helped me stay married for so long. Also, messaging has become invaluable for work, allowing me to multitask and keep things moving with coworkers asynchronously.

Now I am using Twitter, ramping up, getting to know it better. One thing I really like about Twitter is that it is device and software independent unlike cell phone messaging which I must do from my phone. I can twitter from my computer, phone, or IPod Touch. If you haven’t added your phone to your Twitter account, do it now (more info at http://help.twitter.com/forums/10711/entries/14014).

By the way, I looked up Twitter and Twit on a couple online dictionaries. The noun Twit means “an insignificant person” or “an excited state”. The verb means to “taunt”. The verb Twitter means “to talk lightly and rapidly” just like a small bird twitters. I don’t think Mr. Kutcher is an insignificant person, or his accomplishment unworthy of attention, but he does tend to talk excitedly and to taunt (“You’ve been Punked!”). Why not Ashton Kutcher indeed! </>

« Older posts