Curated for content, computing, and digital experience professionals

Year: 2009 (Page 38 of 39)

Can Word Processors be used to Create Structured Content?

Today I will address a question I have grappled with for years, can non-structured authoring tools, e.g., word processors, can be used effectively to create structured content? I have been involved for some time in projects for various state legislatures and publishers trying to use familiar word processing tools to create XML content. So far, based on my experiences, I think the answer is a definite “maybe”. Let me explain and offer some rules for your consideration.

First understand that there is a range of validation and control possible in structured editing, from supporting a very loose data model to very strict data models. A loose data model might enforce a vocabulary of element type names but very little in the way of sequence and occurrence rules or data typing that would be required in a strict data model. Also remember that the rules expressed in your data model should be based on your business drivers such as regulatory compliance and internal policy. Therefore:

Rule number 1: The stricter your data model and business requirements are, the more you need a real structured editor. IMHO only very loose data models can effectively be supported in unstructured authoring tools.

Also, unstructured tools use a combination of formatting oriented structured elements and styles to emulate a structured editing experience. Styles tend to be very flat and have limited processing controls that can be applied to them. For instance, a heading style in an unstructured environment usually is applied only to the bold headline which is followed by a new style for the paragraphs that follow. In a structured environment, the heading and paragraphs would have a container element, perhaps chapter, that clearly indicates the boundaries of the chapter. Therefore structured data is less ambiguous than unstructured data. Ambiguity is easier for humans to deal with than computers which like everything explicitly marked up. It is important to know who is going to consume, process, manage, or manipulate the data. If these processes are mostly manual ones, then unstructured tools may be suitable. If you hope to automate a lot of the processing, such as page formatting, transforms to HTML and other formats, or reorganizing the data, then you will quickly find the limitations of unstructured tools. Therefore:

Rule Number 2: Highly automated and streamline processes usually required content to be created in a true structured editor. And very flexible content that is consumed or processed mostly by humans may support the use of unstructured tools.

Finally, the audience for the tools may influence how structured the content creation tools can be. If your user audience includes professional experts, such as legislative attorneys, you may not be able to convince them to use a tool that behaves differently than the word processor they are used to. They need to focus on the intellectual act or writing and how that law might affect other laws. They don’t want to have to think about the editing tool and markup it uses the way some production editors might. It is also good to remember that working under tight deadlines also impacts how much structure can be “managed” by the authors. Therefore:

Rule Number 3: Structured tools may be unsuitable for some users due to the type of writing they perform or the pressures of the environment in which they work.

By the way, a structured editing tool may be an XML structured editor, but it could also be a Web form, application dialog, Wiki, or some other interface that can enforce the rules expressed in the data model. But this is a topic for another day. </>

Adobe Announces LiveCycle Developer Express via Amazon Web Services

Adobe Systems Incorporated (Nasdaq:ADBE) announced the immediate availability of Adobe LiveCycle ES Developer Express software, a full version of Adobe LiveCycle ES hosted in the Amazon Web Services cloud computing environment. Using the Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3) technologies, Adobe’s offering provides a virtual, self-contained development environment where enterprise developers can prototype, develop, and test Adobe LiveCycle ES applications without needing to install and configure Adobe LiveCycle ES themselves. With Adobe LiveCycle ES Developer Express, Adobe LiveCycle ES applications are pre-configured as ready to run server instances on the Amazon EC2 server. This can help reduce the time required to boot new server instances to minutes, allowing enterprise developers to quickly begin testing and modifying applications. Developers can effectively bullet-proof their applications without having to invest in a development environment or test lab. Old projects may be deleted or saved for future access and new projects can begin without any cleanup required from the last install. Adobe LiveCycle ES Developer Express is immediately available to all members of the Adobe Enterprise Developer Program. http://aws.amazon.com/ec2/, http://www.adobe.com/products/livecycle

Systems Alliance Releases Enhancement Pack 4 for SiteExecutive 4.1 Web Content Management Software

Systems Alliance, Inc. announced the release of Enhancement Pack 4, a software update for the company’s SiteExecutive 4.1 Web content management application. With Enhancement Pack 4, Systems Alliance introduces the SiteExecutive Application Framework, a new mechanism for developing dynamic Web content, along with a blog application, automatic spam reduction capabilities and other improvements. The new Application Framework provides an alternative custom development option for modeling, managing and delivering dynamic or structured content. Content created with the Application Framework can take the form of a Web page, XML feed or virtually any other MIME type. And, the SiteExecutive Application Framework produces search-engine friendly content with human-readable URLs, as well as content-specific browser titles and meta tags.

The Blog Application is the first SiteExecutive component developed using the new Application Framework. It enables the provisioning and management of one or many individual blogs, and offers an easy-to-use interface for managing blog posts and comments.

Other SiteExecutive blog features include: Scheduling posts for advanced publishing, Generating or consuming RSS feeds, Notification and comment approval workflow, Captcha to eliminate robot-generated comments, Keyword and calendar views, and Viewlets for displaying blog content on other SiteExecutive pages or templates.

Enhancement Pack 4 includes a built-in Spam reduction feature which automatically recognizes and rejects robot-generated submissions, saving site owners from the time-wasting task of sorting through junk form submissions. http://www.systemsalliance.com

 

Why Adding Semantics to Web Data is Difficult

If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:

“Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web.” (http://en.wikipedia.org/wiki/Semantic_Web).

To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.

We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don’t see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow

A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.

Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).

Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content’s meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable “Tower of Babel” with many similar, but not fully interoperable data models.

The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?

This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee’s vision of the semantic Web (see for a recent paper on the subject).
Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>

Meet Gilbane: SDL GIM Chicago

We travel to the Windy City on January 21 for the next event in SDL‘s series on global information management. Speakers from Fair Isaac and Garmin will share their experiences with creating, translating, managing and publishing multilingual content. Gilbane’s kick-off presentation looks at trends and best practices emerging from our research on how companies are aligning multilingual content practices with business goals and objectives.
Registration is open to anyone with an interest in managing content for global audiences.

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑