Winds of Change at Tools of Change

O’Reilly’s Tools of Change conference in New York City this week was highly successful, both inside and outside the walls of the Marriott Marquis. The sessions were energetic, well-attended, and–on the whole–full of excellent insight and ideas about the digital trends taking a firm hold of nearly all sectors of the publishing business. Outside the walls, especially on Twitter, online communities were humming with news and commentary on the the conference. (You almost could have followed the entire conference just by following the #toc hash tag at Twitter and accessing the online copies of the presentations.)

But if you had done that, you would have missed the fun of being there. There were some superb keynotes and some excellent general sessions. Notable among the keynotes were Tim O’Reilly himself, Neelan Choksi from Lexcycle (Stanza), and Cory Doctorow. The general sessions  covered a fairly broad spectrum of topics but were heavy on eBooks and community. Because of my own and my clients’ interests, I spent most of my time in the eBook sessions. The session eBooks I: Business Models and Strategy was content-rich. To begin with, you heard straight from senior people at major publishers with significant eBook efforts (Kenneth Brooks from Cengage Learning, Leslie Hulse from Harper Collins Publishers, and Cynthia Cleto from Springer Science+Business Media). Along with their insight, the speakers–and moderator Michael Smith from IDPF–assembled an incredibly valuable wiki of eBook business and technical material to back up their talk. I also really enjoyed a talk from Gavin Bell of Nature, The Long Tail Needs Community, where he made a number of thoughtful points about how publishers need to think longer and harder about how reading engages and changes people and specifically how a publisher can build community around those changes and activities.

There were a few soft spotsin the schedule. Jeff Jarvis’ keynote, What Would Google do with Publishing?, was more about plumping his new book (What Would Google Do?) than anything else, but was also weirdly out of date, even though the book is hot off the presses, with 20th century points like “The link changes everything” and “If you’re not searchable, you won’t be found.” (Publishers are often, somewhat unfairly, accused of being Luddite, but they are not that Luddite.) There were also a couple of technical speakers who didn’t seem to make the necessary business connections to the technical points they were making, which would have been helpful to those members of the audience who were less technical and more publishing-product and -process oriented. But these small weaknesses were easily outshone by the many high points, the terrific overall energy, and the clear enthusiasm of the attendees.

One question I have for the O’Reilly folks is to ask how they will keep the energy going. They have a nascent Tools of Change community site. Perhaps they could enlist some paid community managers to seed and moderate conversations, and also tie community activities to other O’Reilly products such as the books and other live and online events.

O’Reilly has very quickly established a very strong conference and an equally strong brand around the conference. With the publishing industry so engulfed in digital change now, I have to think this kind of conference and community can only continue to grow.

On Stimulating Open Data Initiatives

Yesterday the big stimulus bill cleared the conference committee that resolves the Senate and House versions. If you remember your civics that means it will be likely to pass in the chambers and then be signed into law by the president.

Included in the bill are billions of dollars for digitizing important information such as medical records or government information. Wow! That is a lot of investment! The thinking is that inaccessible information locked in paper or proprietary formats cost us billions each year in productivity. Wow! That’s a lot of waste! Also, that access to the information could spawn a billions of dollars of new products and services, and therefore income and tax revenue. Wow! That’s a lot of growth!

Many agencies and offices have striven to expose useful official information and reports at the federal and state level. Even so, there is a lot of data still locked away, or incomplete or in difficult to use forms. A while ago a Senate official once told me that they do not maintain a single, complete, accurate, official copy of the US Statutes internally. Even if this is no longer true, the public often relies on the “trusted” versions that are available only through paid online services. Many other data types, like many medical records, only exist in paper.

There are a lot of challenges, such as security and privacy issues, even intellectual property rights issues. But there are a lot of opportunities too. There are thousands of data sources that could be tapped into that are currently locked in paper or proprietary formats.

I don’t think the benefits will come at the expense of commercial services already selling this publicly owned information as some may fear. These online sites provide a service, often emphasizing timeliness or value adds like integrating useful data from different sources, in exchange for their fees. I think a combination of free government open data resources and delivery tools, plus innovative commercial products will emerge. Maybe some easily obtained data may become commoditized, but new ways of accessing and integrating information will emerge. The big information services probably have more to fear from startups than from free government applications and data.

As it happens, I saw a demo yesterday of a tool that took all the activity of a state legislature and unified it under one portal. This allows people to track a bill and all related activity in a single place. For free! The bill working its way through both chambers is connected to related hearing agendas and minutes, which are connected to schedules, with status and other information captured in a concise dashboard-like screen format (there are other services you can pay for which fund the site). Each information component came from a different office and was originally in it’s own specialized format. What we were really looking at was a custom data integration application done with AJAX technology integrating heterogeneous data in a unified view. Very powerful, and yet scalable. The key to its success was strong integration of data, the connections that were used to tie the information together. The vendor collected and filtered the data, converted to a common format, added the linkage and relationship information to provide an integrated view into data. All source data is stored separately and maintained by different offices. Five years ago it would have been a lot more difficult to create the service. Technology has advanced, and the data are increasingly available in manageable forms.

The government produces a lot of information that affect us daily that we, as taxpayers and citizens, actually own, but have limited or no access to. These include statutes and regulations, court cases, census data, scientific data and research, agricultural reports, SEC filings, FDA drug information, taxpayer publications, forms, patent information, health guidelines, etc., etc., etc. The list is really long. I am not even scratching the surface! It also includes more interactive and real-time data, such as geological and water data, whether information, and the status of regulation and legislation changes (like reporting on the progress of the stimulus bill as it worked it way through both chambers). All of these can be made more current, expanded for more coverage, integrated with related materials, validated for accuracy. There are also new opportunities to open up the process of using forums and social media tools for collecting feedback from constituents and experts (like the demo mentioned above). Social media tools may both give people an avenue to express their ideas to their elected officials, as well as be a collection tool to gather raw data that can be analyzed for trends and statistics, which in turn becomes new government data that we can use.

IMHO, this investment in open government data is a powerful catalyst that could actually create or change many jobs or business models. If done well, it could provide significant positive returns, streamline government, open access to more information, and enable new and interesting products and applications. </>

DPCI Announces Partnership with Mark Logic to Deliver XML-Based Content Publishing Solutions

DPCI, a provider of integrated technology solutions for organizations that need to publish content to Web, print, and mobile channels, announced that it has partnered with Mark Logic Corporation to deliver XML-based content publishing solutions. The company’s product, MarkLogic Server, allows customers to store, manage, search, and dynamically deliver content. Addressing the growing need for XML-based content management systems, DPCI and Mark Logic have been collaborating on several projects including one that required integration with Amazon’s Kindle reading device. Built specifically for content, MarkLogic Server provides a single solution for search and content delivery that allows customers to build digital content products: rrom task-sensitive online content delivery applications that place content in users’ workflows to digital asset distribution systems that automate content delivery; from custom publishing applications that maximize content re-use and repurposing to content assembly solutions to integrate content.,

WoodWing Releases Enterprise 6 Content Publishing Platform

WoodWing Software has released Enterprise 6, the latest version of the company’s content publishing platform. Equipped with a new editing application called “Content Station”, Enterprise 6 offers article planning tools, direct access to any type of content repository, and integrated Web delivery functionality. Content Station allows users to create articles for delivery to the Web, print, and mobile devices, and offers out-of-the-box integration with the open-source Web content management system Drupal. Content Station works with Enterprise’s new server plug-ins to allow users to search, select, and retrieve content stored in other third-party repositories such as digital asset management systems, archives, and wire systems. Video, audio, and text files can then be collected into “dossiers”, edited, and set for delivery to a variety of outputs, all from a single user-interface. A built-in XML editor lets authors create documents intended solely for digital output. The content planning application lets managers assign content to users both inside and outside of the office. Enterprise’s Web publishing capabilities feature a direct integration with Drupal. Content authors click on a single button to preview or deliver content directly to Drupal and get information such as page views, ratings, and comments back from the Web CMS. And if something needs to be pulled from the site, editors can simply click “Unpublish”. They don’t have to contact a separate Web editor or navigate through another system’s interface. The server plug-in architecture also allows for any other Web content management system to be connected.

Open Government Initiatives will Boost Standards

Following on Dale’s inauguration day post, Will XML Help this President?,  we have today’s invigorating news that President Obama is committed to more Internet-based openness. The CNET article highlights some of the most compelling items from the two memoes, but I am especially heartened by this statement from the memo on the Freedom of Information Act (FOIA):

I also direct the Director of the Office of Management and Budget to update guidance to the agencies to increase and improve information dissemination to the public, including through the use of new technologies, and to publish such guidance in the Federal Register.

The key phrases are "increase and improve information dissemination" and "the use of new technologies." This is keeping in spirit with the FOIA–the presumption is that information (and content) created by or on behalf of the government is public property and should be accessible to the public.  This means that the average person should be able to easily find government content and be able to readily consume it–two challenges that the content technology industry grapples with every day.

The issue of public access is in fact closely related to the issue of long-term archiving of content and information. One of the reasons I have always been comfortable recommending XML and other standards-based technology for content storage is that the content and data would outlast any particular software system or application. As the government looks to make government more open, they should and likely will look at standards-based approaches to information and content access.

Such efforts will include core infrastructure, including servers and storage, but also a wide array of supporting hardware and software falling into three general categories:

  • Hardware and software to support the collection of digital material. This ranges from hardware and software for digitizing and converting analog materials, software for cataloging digital materials with the inclusion of metadata, hardware and software to support data repositories, and software for indexing the digital text and metadata.
  • Hardware and software to support the access to digital material. This includes access tools such as search engines, portals, catalogs, and finding aids, as well as delivery tools allowing users to download and view textual, image-based, multimedia, and cartographic data.
  • Core software for functions such as authentication and authorization, name administration, and name resolution.

Standards such as PDF-A have emerged to give governments a ready format for long-term archiving of routine government documents. But a collection of PDF/A documents does not in and of itself equal a useful government portal. There are many other issues of navigation, search, metadata, and context left unaddressed. This is true even before you consider the wide range of content produced by the government–pictorial, audio, video, and cartographic data are obvious–but also the wide range of primary source material that comes out of areas such as medical research, energy development, public transportation, and natural resource planning.

President Obama’s directives should lead to interesting and exciting work for content technology professionals in the government. We look forward to hearing more.

Adobe Launches Technical Communication Suite 2

Adobe Systems Incorporated (Nasdaq:ADBE) announced the Adobe Technical Communication Suite 2 software, an upgrade of its solution for authoring, reviewing, managing, and publishing rich technical information and training content across multiple channels. Using the suite, technical communicators can create documentation, training materials and Web-enabled user assistance containing both traditional text and 3D designs along with rich media, including Adobe Flash Player compatible video, AVI, MP3 and SWF file support. The enhanced suite includes Adobe FrameMaker 9, the latest version of Adobe’s technical authoring and DITA publishing solution, Adobe RoboHelp 8, a major upgrade to Adobe’s help system and knowledge base authoring tool, Adobe Captivate 4, an upgrade to Adobe’s eLearning authoring tool, and Photoshop CS4, a new addition to the suite. The suite also includes Adobe Acrobat 9 Pro Extended and Adobe Presenter 7. Adobe Technical Communication Suite 2 is a complete solution that offers improved productivity along with support for standards-based authoring including support for Darwin Information Typing Architecture (DITA), an XML-based standard for authoring, producing and delivering technical information. It enables the creation of rich content and publishing through multiple channels, including XML/HTML, print, PDF, WSF, WebHelp, Adobe FlashHelp, Microsoft HTML Help, OracleHelp, JavaHelp and Adobe AIR. FrameMaker 9 offers a new user interface. It supports hierarchical books and DITA 1.1, and makes it easier to author topic-based content. In addition, FrameMaker 9 provides a capability to aggregate unstructured, structured and DITA content in a seamless workflow. Using a PDF based review workflow, authors can import and incorporate feedback. Adobe RoboHelp 8 allows technical communicators to author XHTML-compliant professional help content. The software also supports Lists and Tables, a new CSS editor, Pages and Templates, and a new search functionality. The Adobe Technical Communication Suite 2 is immediately available in North America. Estimated street price for the suite is US$1899. FrameMaker 9, RoboHelp 8 and Captivate 4 are available as standalone products as well. Estimated street price for FrameMaker 9 and RoboHelp 8 is US$999 for each, US$799 for Captivate 4.

Can Word Processors be used to Create Structured Content?

Today I will address a question I have grappled with for years, can non-structured authoring tools, e.g., word processors, can be used effectively to create structured content? I have been involved for some time in projects for various state legislatures and publishers trying to use familiar word processing tools to create XML content. So far, based on my experiences, I think the answer is a definite “maybe”. Let me explain and offer some rules for your consideration.

First understand that there is a range of validation and control possible in structured editing, from supporting a very loose data model to very strict data models. A loose data model might enforce a vocabulary of element type names but very little in the way of sequence and occurrence rules or data typing that would be required in a strict data model. Also remember that the rules expressed in your data model should be based on your business drivers such as regulatory compliance and internal policy. Therefore:

Rule number 1: The stricter your data model and business requirements are, the more you need a real structured editor. IMHO only very loose data models can effectively be supported in unstructured authoring tools.

Also, unstructured tools use a combination of formatting oriented structured elements and styles to emulate a structured editing experience. Styles tend to be very flat and have limited processing controls that can be applied to them. For instance, a heading style in an unstructured environment usually is applied only to the bold headline which is followed by a new style for the paragraphs that follow. In a structured environment, the heading and paragraphs would have a container element, perhaps chapter, that clearly indicates the boundaries of the chapter. Therefore structured data is less ambiguous than unstructured data. Ambiguity is easier for humans to deal with than computers which like everything explicitly marked up. It is important to know who is going to consume, process, manage, or manipulate the data. If these processes are mostly manual ones, then unstructured tools may be suitable. If you hope to automate a lot of the processing, such as page formatting, transforms to HTML and other formats, or reorganizing the data, then you will quickly find the limitations of unstructured tools. Therefore:

Rule Number 2: Highly automated and streamline processes usually required content to be created in a true structured editor. And very flexible content that is consumed or processed mostly by humans may support the use of unstructured tools.

Finally, the audience for the tools may influence how structured the content creation tools can be. If your user audience includes professional experts, such as legislative attorneys, you may not be able to convince them to use a tool that behaves differently than the word processor they are used to. They need to focus on the intellectual act or writing and how that law might affect other laws. They don’t want to have to think about the editing tool and markup it uses the way some production editors might. It is also good to remember that working under tight deadlines also impacts how much structure can be “managed” by the authors. Therefore:

Rule Number 3: Structured tools may be unsuitable for some users due to the type of writing they perform or the pressures of the environment in which they work.

By the way, a structured editing tool may be an XML structured editor, but it could also be a Web form, application dialog, Wiki, or some other interface that can enforce the rules expressed in the data model. But this is a topic for another day. </>

Publishing with a Capital “P”

Here at Gilbane Boston, we just heard from Michael Edson, Director, Web and New Media Strategy, Office of the CIO, Smithsonian Institution. His talk described the Smithsonian Institution’s current Web and New Media strategy process and the cultural, technical, and organizational implications of the vision of a Smithsonian Commons–a critical-mass of content, services, and tools designed to fuel innovation and stimulate engagement with the world’s scientific and cultural knowledge.
Many of the efforts are nascent, but this project on Flickr gives you a nice idea idea of the potential for this kind of effort.

