Recently by Dale Waldt

If you have been following recent XML Technologies blog entries, you will notice we have been talking a lot lately about XML Smart Content, what it is and the benefits it can bring to an organization. These include flexible, dynamic assembly for delivery to different audiences, search optimization to improve customer experience, and improvements for distributed collaboration. Great targets to aim for, but you may ask are we ready to pursue these opportunities? It might help to better understand the technology landscape involved in creating and delivering smart content.

The figure below illustrates the technology landscape for smart content. At the center are fundamental XML technologies for creating modular content, managing it as discrete chunks (with or without a formal content management system), and publishing it in an organized fashion. These are the basic technologies for "one source, one output" applications, sometimes referred to as Singe Source Publishing (SSP) systems.

SCLandscape.jpg

The innermost ring contains capabilities that are needed even when using a dedicated word processor or layout tool, including editing, rendering, and some limited content storage capabilities. In the middle ring are the technologies that enable single-sourcing content components for reuse in multiple outputs. They include a more robust content management environment, often with workflow management tools, as well as multi-channel formatting and delivery capabilities and structured editing tools. The outermost ring includes the technologies for smart content applications, which are described below in more detail.

It is good to note that smart content solutions rely on structured editing, component management, and multi-channel delivery as foundational capabilities, augmented with content enrichment, topic component assembly, and social publishing capabilities across a distributed network. Descriptions of the additional capabilities needed for smart content applications follow.

Content Enrichment / Metadata Management: Once a descriptive metadata taxonomy is created or adopted, its use for content enrichment will depend on tools for analyzing and/or applying the metadata. These can be manual dialogs, automated scripts and crawlers, or a combination of approaches. Automated scripts can be created to interrogate the content to determine what it is about and to extract key information for use as metadata. Automated tools are efficient and scalable, but generally do not apply metadata with the same accuracy as manual processes. Manual processes, while ensuring better enrichment, are labor intensive and not scalable for large volumes of content. A combination of manual and automated processes and tools is the most likely approach in a smart content environment. Taxonomies may be extensible over time and can require administrative tools for editorial control and term management.

Component Discovery / Assembly: Once data has been enriched, tools for searching and selecting content based on the enrichment criteria will enable more precise discovery and access. Search mechanisms can use metadata to improve search results compared to full text searching. Information architects and organizers of content can use smart searching to discover what content exists, and what still needs to be developed to proactively manage and curate the content. These same discovery and searching capabilities can be used to automatically create delivery maps and dynamically assemble content organized using them.

Distributed Collaboration / Social Publishing: Componentized information lends itself to a more granular update and maintenance process, enabling several users to simultaneously access topics that may appear in a single deliverable form to reduce schedules. Subject matter experts, both remote and local, may be included in review and content creation processes at key steps. Users of the information may want to "self-organize" the content of greatest interest to them, and even augment or comment upon specific topics. A distributed social publishing capability will enable a broader range of contributors to participate in the creation, review and updating of content in new ways.

Federated Content Management / Access: Smart content solutions can integrate content without duplicating it in multiple places, rather accessing it across the network in the original storage repository. This federated content approach requires the repositories to have integration capabilities to access content stored in other systems, platforms, and environments. A federated system architecture will rely on interoperability standards (such as CMIS), system agnostic expressions of data models (such as XML Schemas), and a robust network infrastructure (such as the Internet).

These capabilities address a broader range of business activity and, therefore, fulfill more business requirements than single-source content solutions. Assessing your ability to implement these capabilities is essential in evaluating your organizations readiness for a smart content solution.

What is hot in XML these days? I have been to a few conferences and meetings, talked with many clients, participated in various research projects, and developed case studies on emerging approaches to XML adoption. DITA (Darwin Information Typing Architecture) is hot. Semantically enriched XML is hot. Both enable some interesting functionality for content delivered via print, on the web, and through mobile delivery channels. These include dynamic assembly of content organized into a variety of forms for custom uses, improved search and discovery of content, content interoperability across platforms, and distributed collaboration in creating and managing content.

On November 30, prior to the Gilbane Conference in Boston, Geoff Bock and I will be holding our 3rd workshop on Smart Content which is how we refer to semantically enriched, modular content (it's easier to say). In the seminar we will discuss what makes content smart, how it is being developed and deployed in several organizations, and dive into some technical details on DITA and semantic enrichment.  This highly interactive seminar has been well received in prior sessions, and will be updated with our recently completed research findings.  More information on the seminar is available at  http://gilbaneboston.com/workshops.html.

By the way, the research report, entitled Smart Content in the Enterprise, is now available at the research section at Gilbane.com. It includes several interesting case studies from a variety of organizations, and a lot of good information for those considering taking their content to the next level. We encourage you to download it (it is free). I also hope to see you in Boston at the workshop.

Authoring in a structured text environment has traditionally been done with dedicated structured editors. These tools enable validation and user assisted markup features that help the user create complete and valid content. But these structured editors are somewhat complicated and unusual and require training in their use for the user to become proficient. The learning curve is not very steep but it does exist.

Many organizations have come to see documentation departments as a process bottleneck and try to engage others throughout the enterprise in the content creation and review processes. Engineers and developers can contribute to documentation and have a unique technical perspective. Installation and support personnel are on the front lines and have unique insight into how the product and related documentation is used. Telephone operators not only need the information at their fingertips, but can also augment it with comments and ides that occur while supporting users. Third-party partners and reviewers may also have a unique perspective and role to play in a distributed, collaborative content creation, management, review, and delivery ecosystem.

Our recently completed research on XML Smart Content in the Enterprise indicates that as we strive to move content creation and management out of the documentation department silo, we will also need to consider how the data is encoded and the usefulness of the data model in meeting our expanded business requirements. Smart content is multipurpose content designed with several uses in mind. Smart content is modular to support being assembled in a variety of forms. And smart content is structured content that has been enriched with semantic information to better identify it's topic and role to aide processing and searching. For these reasons, smart content also improves distributed collaboration. Let me elaborate.

One of the challenges for distributed collaboration is the infrequency of user participation and therefore, unfamiliarity with structured editing tools. It makes sense to simplify the editing process and tools for infrequent users. They can't always take a refresher course in the editor and it's features. They may be working remotely, even on a customer site installing equipment or software. These infrequent users need structured editing tools that are designed for them. These collaboration tools need to be intuitive and easy to figure out, easily accessible from just about anywhere, and should be affordable and have flexible licensing to allow a larger number of users to participate in the management of the content. This usually means one of two things: either the editor will be a plug in to another popular word processing system (e.g., MS Word), or it will be accessed though a thin-client browser, like a Wiki editor. In some environments, it is possible that both may be need in addition to traditional structured editing tools. Smart content modularity and enrichment allows flexibility in editing tools and process design. This allows the  use of a variety of editing tools and flexibility in process design, and therefore expanding who can collaborate from throughout the enterprise.

Also, infrequent contributors may not be able to master navigating and operating within a  complex repository and workflow environment either for the same familiarity reasons. Serving up information to a remote collaborator might be enhanced with keywords and other metadata that is designed to optimize searching and access to the content. Even a little metadata can provide a lot of simplicity to an infrequent user. Product codes, version information, and a couple of dates would allow a user to hone in on the likely content topics and select content to edit from a well targeted list of search results. Relationships between content modules that are indicated in metadata can alert a user that when one object is updated, other related objects may need to be reviewed for potential update as well.

It is becoming increasingly clear that there is no one model for XML or smart content creation and editing. Just as a carpenter may have several saws, each designed for a particular type of cut, a robust smart content structured content environment may have more than one editor in use. It behooves us to design our systems and tools to meet the desired business processes and user functionality, rather than limit our processes to the features of one tool.

In our recently completed research on Smart Content in the Enterprise we explored how organizations are taking advantage of benefits from XML throughout the enterprise and not just in the documentation department. Our findings include several key issues that leading edge XML implementers are addressing including new delivery requirements, new ways of creating and managing content, and the use of standards to create rich, interoperable content. In our case studies we examined how some are breaking out of the documentation department silo and enabling others inside or even outside the organization to contribute and collaborate on content. Some are even using crowd sourcing and social publishing to allow consumers of the information to annotate it and participate in its development. We found that expectations for content creation and management have changed significantly and we need to think about how we organize and manage our data to support these new requirements. One key finding of the research is that organizations are taking a different approach to repurposing their content, a more proactive approach that might better be called "multipurposing".

In the XML world we have been talking about repurposing content for decades. Repurposing content usually means content that is created for one type of use is reorganized, converted, transformed, etc. for another use. Many organizations have successfully deployed XML systems that optimize delivery in multiple formats using what is often referred to as a Single Source Publishing (SSP) process where a single source of content is created and transformed into all desired deliverable formats (e.g., HTML, PDF, etc.).

Traditional delivery of content in the form of documents, whether in HTML or PDF, can be very limiting to users who want to search across multiple documents, reorganize document content into a form that is useful to the particular task at hand, or share portions with collaborators. As the functionality on Web sites and mobile devices becomes more sophisticated, new ways of delivering content are needed to take advantage of these capabilities. Dynamic assembly of content into custom views can be optimized with delivery of content components instead of whole documents. Powerful search features can be enhanced with metadata and other forms of content enrichment.

SSP and repurposing content traditionally focuses on the content creation, authoring, management and workflow steps up to delivery. In order for organizations to keep up with the potential of delivery systems and the emerging expectations of users, it behooves us to take a broader view of requirements for content systems and the underlying data model. Developers need to expand the scope of activities they evaluate and plan for when designing the system and the underlying data model. They should consider what metadata might improve faceted searching or dynamic assembly. In doing so they can identify the multiple purposes the content is destined for throughout the ecosystem in which it is created, managed and consumed.

Multipurpose content is designed with additional functionality in mind including faceted search, distributed collaboration and annotation, localization and translation, indexing, and even provisioning and other supply chain transactions. In short, multipurposing content focuses on the bigger picture to meet a broader set of business drivers throughout the enterprise, and even beyond to the needs of the information consumers.

It is easy to get carried away with data modeling and an overly complex data model usually requires more development, maintenance, and training than would otherwise be required to meet a set of business needs. You definitely want to avoid using specific processing terminology when naming elements (e.g., specific formatting, element names that describe processing actions instead of defining the role of the content). You can still create data models that address the broader range of activities without using specific commands or actions. Knowing a chunk of text is a "definition" instead of an "error message" is useful and far more easy to reinterpret for other uses than an "h2" element name or an attribute for display='yes'. Breaking chapters into individual topics eases custom, dynamic assembly. Adding keywords and other enrichment can improve search results and the active management of the content. In short, multipurpose data models can and should be comprehensive and remain device agnostic to meet enterprise requirements for the content.

The difference between repurposing content and multipurpose content is a matter of degree and scope, and requires generic, agnostic components and element names. But most of all, multipurposing requires understanding the requirements of all processes in the desired enterprise environment up front when designing a system to make sure the model is sufficient to deliver designed outcomes and capabilities. Otherwise repurposing content will continue to be done as an afterthought process and possibly limit the usefulness of the content for some applications.
 

What is Smart Content?

user-pic
Vote 2 Votes  

At Gilbane we talk of "Smart Content," "Structured Content," and "Unstructured Content." We will be discussing these ideas in a seminar entitled "Managing Smart Content" at the Gilbane Conference next week in Boston. Below I share some ideas about these types of content and what they enable and require in terms of processes and systems.

When you add meaning to content you make it "smart" enough for computers to do some interesting things. Organizing, searching, processing, and discovery are greatly improved, which also increases the value of the data. Structured content allows some, but fewer, processes to be automated or simplified, and unstructured content enables very little to be streamlined and requires the most ongoing human intervention.

Most content is not very smart. In fact, most content is unstructured and usually more difficult to process automatically. Think flat text files, HTML without all the end tags, etc. Unstructured content is more difficult for computers to interpret and understand than structured content due to incompleteness and ambiguity inherent in the content. Unstructured content usually requires humans to decipher the structure and the meaning, or even to apply formatting for display rendering.

The next level up toward smart content is structured content. This includes wellformed XML documents, content compliant to a schema, or even RDMS databases. Some of the intelligence is included in the content, such as boundaries of element (or field) being clearly demarcated, and element names that mean something to users and systems that consume the information. Automatic processing of structured content includes reorganizing, breaking into components, rendering for print or display, and other processes streamlined by the structured content data models in use.

Finally, smart content is structured content that also includes the semantic meaning of the information. The semantics can be in a variety of forms such as RDFa attributes applied to structured elements, or even semantically names elements. However it is done, the meaning is available to both humans and computers to process.

SmartContentValue.jpgSmart content enables highly reusable content components and powerful automated dynamic document assembly. Searching can be enhanced with the inclusion of metadata and buried semantics in the content providing more clues as to what the data is about, where it came from, and how it is related to other content.Smart content enables very robust, valuable content ecosystems.

Deciding which level of rigor is needed for a specific set of content requires understanding the business drivers intended to be met. The more structure and intelligence you add to content, the more complicated and expensive the system development and content creation and management processes may become. More intelligence requires more investment, but may be justified through benefits achieved.

I think it is useful if the XML and CMS communities use consistent terms when talking about the rigor of their data models and the benefits they hope to achieve with them. Hopefully, these three terms, smart content, structured content, and unstructured content ring true and can be used productively to differentiate content and application types.

I recently wrote a short Gilbane Spotlight article for the EMC XML community site about the state of Iowa going paperless (article can be found here) in regards to its Administrative Code publication. It got me to thinking, "When is a book no longer a book?"

Originally the admin code was produced as a 10,000 page loose-leaf publication service containing all the regulations of the state. For the last 10 years it has also appeared on the Web as PDFs of pages, and more recently, independent data chunks in HTML. And now they have discontinued the commercial printing of the loose-leaf version and only rely on the electronic versions to inform the public. They still produce PDF pages that resemble the printed volumes that are intended for local printing of select sections by public users of the information. But the electronic HTML version is being enhanced to improve reusability of the content, present it in alternative forms and integrated with related materials, etc. Think mashups and improved search capabilities. The content is managed in an XML-based Single Source Publishing system that produces all output forms.

I have migrated many, many printed publications to XML SSP platforms. Most follow the same evolutionary path regarding how the information is delivered to consumers. First they are printed. Then a second electronic copy is produced simultaneously with the print using separate production processes. Then the data is organized in a single database and reformatted to allow editing that can produce both print and electronic. Eventually the data gets enhanced and possibly broken into chunks to better enable reusing the content, but the print is still a viable output format. Later, the print is discontinued as the subscription list falls and the print product is no longer feasible. Or the electronic version is so much better, that people stop buying the print version.
So back to the original question, is it no longer a book? Is it when you stop printing pages? Or when you stop producing the content in page-oriented PDFs? Or does it have to do with how you manage and store the information?

Other changes take place in how the information is edited, formatted, and stored that might influence the answer to the question. For instance, if the content is still managed as a series of flat files, like chapters, and assembled for print, it seems to me that it is still a book, especially if it still contains content that is very book oriented, like tables of contents and other front matter, indexes, and even page numbers. Eventually, the content may be reorganized as logical chunks stored in a database, extracted for one or more output formats and organized appropriately for each delivery version, as in SSP systems. Print artifacts like TOCs may be completely generated and not stored as persistent objects, or they can be created and managed as build lists or maps (like with DITA). As long as one version is still book-like, IMHO it is still a book.

I would posit that once the printed versions are discontinued, and all electronic versions no longer contain print-specific artifacts, then maybe this is no longer a book, but simply content.

Anyone who works with XML has probably had to "sell" the idea of using the standard instead of alternative approaches, whether as an internal evangelist of XML or in a formal sales role. We have developed some pretty convincing arguments, such as automating redundant processes, quality checking and validation of content, reuse of content using a single source publishing approach, and so on. These types of benefits are easily understood by the technical documentation department or developers and administrators in the IT group. And they are easy arguments to make.

Even so, that leaves a lot of people who can benefit from the technology but may never need know that XML is part of the solution. The rest of the enterprise may not be in tune with the challenges faced by the documentation department, and instead focus on other aspects of running a business, like customer support, manufacturing, fulfillment, or finance, etc.. If you tell them the software solution you want to buy has "XML Inside" they may stare off into space and let their eyes glaze over, even fall asleep. But if you tell them you have a way to reduce expensive customer support phone calls by making improvements to their public-facing Web content and capabilities, you might get more of their attention.

I have been around the XML community for a very long time, and we tend to look into our belly buttons for the meaning of XML. This is often doen at the expense of looking around us and seeing what problems are out there before we start talking about solutions to apply to them. Everything looks like a nail because we have this really nifty hammer called XML. But when CD-ROMs were introduced, people didn't run around talking about the benefits of ISO 9660 (the standard that dictates how data is written to a CD). Okay they did at first to other technologists and executives in big companies adopting the standard, but rarely did the end consumer hear about the standard. Instead, we talked about the massive increase in data storage, and the flexibility of a consistent data storage format across operating systems. So we need to remember that XML is not what we want to accomplish, but rather how we may get things done to meet our goals. Therefore, we need to understand and describe our requirements in terms of these business drivers, not the tools we use to address them.

Part of the problem is that there are several potential audiences for the XML evangelism message, each with their own set of concerns and domain-specific challenges. End users want the ability to get the work out the door in a timely manner, at the right quality level, and that the tools are easy to use. Line Managers may add sensitivity to pricing, performance, maintenance and deployment costs, etc. These types of concerns I would classify as tactical departmental concerns focusing on operational efficiency (bottom line). 

Meanwhile Product Managers, Sales, Customer Service, Fulfillment, Finance, etc. are more geared toward enterprise goals and strategies such as reducing product support costs, and increasing revenue, in addition to operational efficiency. Even stated goals like synchronizing releases of software and documentation, making data more flexible and robust to enable new Web and mobile delivery options, are really only supporting the efforts to achieve the first two objectives of better customer service and increased sales, which I would classify as strategic enterprise concerns.

The deft XML evangelist, to succeed in the enterprise discussion, needs to know about a lot more than the technology and processes in the documentation department, or he or she will be limited to tactical, incremental improvements. The boss may want, instead, to focus on how the data can be improved to make robust Web content that can be dynamically assembled according to the viewer's profile. Or how critical updates can be delivered electronically and as fast as possible, while the complete collection of information is prepared for more time consuming, but equally valuable printed delivery in a multi-volume set of books. Or how content can be queried, rearranged, reformatted and delivered in a completely new way to increase revenue. Or how a business system can automatically generate financial reporting information in a form accurate and suitable enough for submission to the government, but without the army of documentation labor used previously.

At Gilbane we often talk about the maturity of XML approaches, not unlike the maturity model for software. We haven't finalized a spectrum of maturity levels yet, but I think of XML applications as ad hoc, departmental, and enterprise in nature. Ad hoc is where someone decides to use an XML format for a simple process, maybe configuration files driving printers or other applications. Often XML is adopted with no formal training and little knowledge outside of the domain in which it is being applied.

Departmental applications tend to focus on operational efficiency, especially as it relates to creating and distributing textual content. Departmental applications are governed by a single department head but may interact with other groups and delivery feeds, but can standalone in their own environment.  An enterprise application of XML would need governance from several departments or information partners, and would focus on customer or compliance facing issues and possibly growth of the business. They tend to have to work within a broader framework of applications and standards.

Each of these three application types requires different planning and justification. For ad hoc use of XML it is usually up to the individual developer to decide if XML is the right format, if a schema will be needed, and what the markup and data model are, etc. Very little "selling" is needed here except as friendly debate between developers, architects and line managers. Usually these applications can be tweaked and changed easily with little impact beyond local considerations.

Departmental application of XML usually requires a team representing all stakeholders involved in the process, from users to consumers of the info. There may be some departmental architectural standards, but exceptions to these are easier to accommodate than with enterprise applications. A careful leader of a departmental application will look upstream and down stream in the information flow to include some of their needs. Also, they need to realize that the editing process in their department may become more complex and require additional skills and resources, but that these drawbacks are more than offset but savings in other areas, such as page layout, or conversion to Web formats which can be highly automated. Don't forget to explain these benefits to the users whose work just got a little more complicated!

An Enterprise solution is by definition tied to the business drivers of the enterprise, even if that means some decisions may seem like they come at the expense of one department over another. This is where an evangelist could be useful, but not if they only focus on XML instead of the benefits it provides. Executives need to know how much revenue can be increased, how many problem reports can be avoided in customer service, and whether they can meet regulatory compliance guidelines, etc. This is a much more complicated set of issues with dependencies on and agreement with other departments needed to be successful. If you can't provide these types of answers, you may be stuck in departmental thinking.

XML may be the center of my universe (my belly button so to speak), but it is usually not the center of my project's sponsor's universe. I have to have the right message to covince them to make signifiaccnt investment in the way their enterprise operates.  </>

Structured Editing & Wikis

user-pic
Vote 3 Votes  

If you know me you will realize that I tend to revisit XML authoring tools and processes frequently. It is one of my favorite topics. The intersection of structured tools and messy human thinking and behavior is an area fraught with usability issues, development challenges, and careful business case thinking. And therefore, a topic ripe for discussion.

I had an interesting conversation with a friend about word processors and XML editors the other day. His argument was that the word processing product model may not be the best, and certainly isn't the only, way to prepare and manage structured content.

A word processor is software that has evolved to support the creation of documents. The word processing software model was developed when people needed to create documents, and then later added formatting and other features. This model is more than 25 years old (I remember using a word processor for the first time in college in 1980).

Of course it was logical to emulate how typewriters worked since the vast majority of information at the time was destined for paper documents. Now word processors include features for writing, editing, reviewing, formatting, and limited structural elements like links, indexes, etc. Again, all very document oriented. The content produced may be reused for other purposes if transformed in a post process (e.g., it could output HTML & PDF for Web, breaking into chunks for a repository or secondary use, etc.), but there are limits and other constraints, especially if your information is primarily designed to be consumed in print or document form.

It is easy to think of XML-structured editors, and the word processor software model they are based upon, as the most likely way to create structured content. But in my opinion, structured editors pay too much homage to word processing features and processes. I also think too many  project teams assume that the only way to edit XML content is in an XML document editor. Don't get me wrong, many people have successfully deployed XML editors and achieved targeted business goals, myself included, but I can point out many instances where an alternative approach to editing content might be more efficient.

Database tools that organize the information logically and efficiently are not likely to store that data as documents. For instance, you may have an financial system with a lot of info in relational fields that is extracted to produce printable documents like monthly statements, invoices, etc.

Or software manuals that are customized for specific configurations using reusable data objects and related document maps instead maintaining the information as static, hierarchically-organized documents.

Or aircraft information that needs to match the configuration of a specific plane or tail number, selected from a complete library of data objects stored centrally.

Or statutes that start formatted as bills, then later appear as enacted laws, then later yet again as published, codified statutes, each with their own formatting and structural peccadilloes.

Or consider a travel guide publisher that collects information on thousands of hotels, restaurants, attractions, and services in dozens of countries and cities. Sure, the content is prepared with the intent of publishing it in a book, but it is easy to see how it can be useful for other uses, including providing hotel data to travel-related Web sites, or building specialized, custom booklets for special needs (e.g., a local guide for a conference, guides to historical neighborhoods, etc.). 

In these examples of what some might call database publishing, system designers need to ask them selves what would be the best tool for creating and maintaining the information. They are great candidates for a database, some application dialogs and wizards, and some extraction and transformation applications to feed Web and other platforms for consumption by users. They may not even involve an editor per se, but might rely entirely a Wiki or other dialog for content creation and editing.

Word processors require a mix of skills, including domain expertise on the subject being written about, grammar and editing, and some formatting & design, use of the software itself, etc. While I personally believe everyone, not just teachers and writers, should be skilled in writing well and making documents look legible and appealing, I realize many folks are best suited for other roles. That is why we divide labor into roles. Domain experts (e.g., lawyers, aircraft engineers, scientists and doctors, etc.) are usually responsible for accuracy and quality of the ideas and information, while editorial and product support people clean up the writing and formatting and make it presentable. So, for domain experts, it may be more efficient to provide a tool that only manages the content creation, structuring, linking, organization, etc. with limited word processing capabilities, and leave the formatting and organization to the system or another department or automated style sheets.

In my mind, a Wiki is a combination of text functionality and database management features that allow content to be created and managed in a broader Web content platform (which also may include static pages, search interfaces, pictures, PDFs, etc.). In this model, the Web is the primary use and printing is secondary. Domain experts are not bothered with concepts like page layout, running heads, tables of content generation, justification & hyphenation, etc., much to the delight of the domain experts!

I am bullish on Wikis as content creation and management tools, even when the content is destined for print. I have seen some that hide much of the structure and technical "connective tissue" from the author, but produce well formatted, integrated information. The blogging tool I am using to create this article is one example of a Wiki-like interface that has a few bells and whistles for adding structure (e.g., keywords) dedicated to a specific content creation purpose. It only emulates word processing slightly with limited formatting tools, but is loaded with other features designed to improve my blog entries. For instance, I can pick a keyword from a controlled taxonomy from a pull-down list. And all within a Web browser, not a fat client editor package. This tool is optimized for making blog content, but not for, let's say, scientific papers or repair manuals. It is targeted for a specific class of users, bloggers. Similarly, XML-editors as we have come to know them, are more adept at creating documents and document chunks than other interfaces.

Honestly, on more than one occasion I have pounded a nail with a wrench, or tightened a bolt with the wrong kind of pliers. Usually I get the same results, but sometimes it takes longer or has a less desirable result than if I had used a more appropriate tool. The same is true for editing tools.

On a final note, forgive me if I make a gratuitous plug, but authoring approaches and tools will be the subject of a panel I am chairing at the Gilbane San Francisco conference in early June if you want to hear more. </>

XML Communities on LinkedIn

user-pic
Vote 1 Vote  

Just spent an excessive number of hours perusing the XML and related community groups on various Web community sites.  There are several social community tools and sites, but even just LinkedIn (http://www.linkedin.com/). seems to have dozens of community groups that at least mention XML in their descriptions, and several have it prominent in their names and logos. I joined several. Let's see what happens. Stay tuned... </>

Yesterday the big stimulus bill cleared the conference committee that resolves the Senate and House versions. If you remember your civics that means it will be likely to pass in the chambers and then be signed into law by the president.

Included in the bill are billions of dollars for digitizing important information such as medical records or government information. Wow! That is a lot of investment! The thinking is that inaccessible information locked in paper or proprietary formats cost us billions each year in productivity. Wow! That's a lot of waste! Also, that access to the information could spawn a billions of dollars of new products and services, and therefore income and tax revenue. Wow! That's a lot of growth!

Many agencies and offices have striven to expose useful official information and reports at the federal and state level. Even so, there is a lot of data still locked away, or incomplete or in difficult to use forms. A while ago a Senate official once told me that they do not maintain a single, complete, accurate, official copy of the US Statutes internally. Even if this is no longer true, the public often relies on the "trusted" versions that are available only through paid online services. Many other data types, like many medical records, only exist in paper.

There are a lot of challenges, such as security and privacy issues, even intellectual property rights issues. But there are a lot of opportunities too. There are thousands of data sources that could be tapped into that are currently locked in paper or proprietary formats.

I don't think the benefits will come at the expense of commercial services already selling this publicly owned information as some may fear. These online sites provide a service, often emphasizing timeliness or value adds like integrating useful data from different sources, in exchange for their fees. I think a combination of free government open data resources and delivery tools, plus innovative commercial products will emerge. Maybe some easily obtained data may become commoditized, but new ways of accessing and integrating information will emerge. The big information services probably have more to fear from startups than from free government applications and data.

As it happens, I saw a demo yesterday of a tool that took all the activity of a state legislature and unified it under one portal. This allows people to track a bill and all related activity in a single place. For free! The bill working its way through both chambers is connected to related hearing agendas and minutes, which are connected to schedules, with status and other information captured in a concise dashboard-like screen format (there are other services you can pay for which fund the site). Each information component came from a different office and was originally in it's own specialized format. What we were really looking at was a custom data integration application done with AJAX technology integrating heterogeneous data in a unified view. Very powerful, and yet scalable. The key to its success was strong integration of data, the connections that were used to tie the information together. The vendor collected and filtered the data, converted to a common format, added the linkage and relationship information to provide an integrated view into data. All source data is stored separately and maintained by different offices. Five years ago it would have been a lot more difficult to create the service. Technology has advanced, and the data are increasingly available in manageable forms.

The government produces a lot of information that affect us daily that we, as taxpayers and citizens, actually own, but have limited or no access to. These include statutes and regulations, court cases, census data, scientific data and research, agricultural reports, SEC filings, FDA drug information, taxpayer publications, forms, patent information, health guidelines, etc., etc., etc. The list is really long. I am not even scratching the surface! It also includes more interactive and real-time data, such as geological and water data, whether information, and the status of regulation and legislation changes (like reporting on the progress of the stimulus bill as it worked it way through both chambers). All of these can be made more current, expanded for more coverage, integrated with related materials, validated for accuracy. There are also new opportunities to open up the process of using forums and social media tools for collecting feedback from constituents and experts (like the demo mentioned above). Social media tools may both give people an avenue to express their ideas to their elected officials, as well as be a collection tool to gather raw data that can be analyzed for trends and statistics, which in turn becomes new government data that we can use.

IMHO, this investment in open government data is a powerful catalyst that could actually create or change many jobs or business models. If done well, it could provide significant positive returns, streamline government, open access to more information, and enable new and interesting products and applications. </>

Bill's latest Tweet

NewsShark

Sign-up for our weekly NewsShark newsletter.
Content technology industry news without the hype:

* Email

* First Name

* Last Name

* = Required Field