Today I will address a question I have grappled with for years, can non-structured authoring tools, e.g., word processors, can be used effectively to create structured content? I have been involved for some time in projects for various state legislatures and publishers trying to use familiar word processing tools to create XML content. So far, based on my experiences, I think the answer is a definite “maybe”. Let me explain and offer some rules for your consideration.

First understand that there is a range of validation and control possible in structured editing, from supporting a very loose data model to very strict data models. A loose data model might enforce a vocabulary of element type names but very little in the way of sequence and occurrence rules or data typing that would be required in a strict data model. Also remember that the rules expressed in your data model should be based on your business drivers such as regulatory compliance and internal policy. Therefore:

Rule number 1: The stricter your data model and business requirements are, the more you need a real structured editor. IMHO only very loose data models can effectively be supported in unstructured authoring tools.

Also, unstructured tools use a combination of formatting oriented structured elements and styles to emulate a structured editing experience. Styles tend to be very flat and have limited processing controls that can be applied to them. For instance, a heading style in an unstructured environment usually is applied only to the bold headline which is followed by a new style for the paragraphs that follow. In a structured environment, the heading and paragraphs would have a container element, perhaps chapter, that clearly indicates the boundaries of the chapter. Therefore structured data is less ambiguous than unstructured data. Ambiguity is easier for humans to deal with than computers which like everything explicitly marked up. It is important to know who is going to consume, process, manage, or manipulate the data. If these processes are mostly manual ones, then unstructured tools may be suitable. If you hope to automate a lot of the processing, such as page formatting, transforms to HTML and other formats, or reorganizing the data, then you will quickly find the limitations of unstructured tools. Therefore:

Rule Number 2: Highly automated and streamline processes usually required content to be created in a true structured editor. And very flexible content that is consumed or processed mostly by humans may support the use of unstructured tools.

Finally, the audience for the tools may influence how structured the content creation tools can be. If your user audience includes professional experts, such as legislative attorneys, you may not be able to convince them to use a tool that behaves differently than the word processor they are used to. They need to focus on the intellectual act or writing and how that law might affect other laws. They don’t want to have to think about the editing tool and markup it uses the way some production editors might. It is also good to remember that working under tight deadlines also impacts how much structure can be “managed” by the authors. Therefore:

Rule Number 3: Structured tools may be unsuitable for some users due to the type of writing they perform or the pressures of the environment in which they work.

By the way, a structured editing tool may be an XML structured editor, but it could also be a Web form, application dialog, Wiki, or some other interface that can enforce the rules expressed in the data model. But this is a topic for another day. </>