Curated for content, computing, and digital experience professionals

Category: Web technologies & information standards (Page 25 of 58)

Here we include topics related to information exchange standards, markup languages, supporting technologies, and industry applications.

Adobe Launches Technical Communication Suite 2

Adobe Systems Incorporated (Nasdaq:ADBE) announced the Adobe Technical Communication Suite 2 software, an upgrade of its solution for authoring, reviewing, managing, and publishing rich technical information and training content across multiple channels. Using the suite, technical communicators can create documentation, training materials and Web-enabled user assistance containing both traditional text and 3D designs along with rich media, including Adobe Flash Player compatible video, AVI, MP3 and SWF file support. The enhanced suite includes Adobe FrameMaker 9, the latest version of Adobe’s technical authoring and DITA publishing solution, Adobe RoboHelp 8, a major upgrade to Adobe’s help system and knowledge base authoring tool, Adobe Captivate 4, an upgrade to Adobe’s eLearning authoring tool, and Photoshop CS4, a new addition to the suite. The suite also includes Adobe Acrobat 9 Pro Extended and Adobe Presenter 7. Adobe Technical Communication Suite 2 is a complete solution that offers improved productivity along with support for standards-based authoring including support for Darwin Information Typing Architecture (DITA), an XML-based standard for authoring, producing and delivering technical information. It enables the creation of rich content and publishing through multiple channels, including XML/HTML, print, PDF, WSF, WebHelp, Adobe FlashHelp, Microsoft HTML Help, OracleHelp, JavaHelp and Adobe AIR. FrameMaker 9 offers a new user interface. It supports hierarchical books and DITA 1.1, and makes it easier to author topic-based content. In addition, FrameMaker 9 provides a capability to aggregate unstructured, structured and DITA content in a seamless workflow. Using a PDF based review workflow, authors can import and incorporate feedback. Adobe RoboHelp 8 allows technical communicators to author XHTML-compliant professional help content. The software also supports Lists and Tables, a new CSS editor, Pages and Templates, and a new search functionality. The Adobe Technical Communication Suite 2 is immediately available in North America. Estimated street price for the suite is US$1899. FrameMaker 9, RoboHelp 8 and Captivate 4 are available as standalone products as well. Estimated street price for FrameMaker 9 and RoboHelp 8 is US$999 for each, US$799 for Captivate 4. http://www.adobe.com

Will XML Help this President?

I’m watching the inauguration activity today all day (not getting much work done) and getting caught up in the optimism and history of it all. And what does this have to do with XML you ask? It’s a stretch, but I am giddy from the festivities, so bare with me please. I think there is a big role for XML and structured technologies in this paradigm shift, albeit XML will be quietly doing it’s thing in the background as always.

In 1986, when SGML, XML’s precursor, was being developed, I worked for the IRS in Washington. I was green, right out of college. My Boss, Bill Davis, said I should look into this SGML stuff. I did. I was hooked. It made sense. We could streamline the text applications we were developing. I helped write the first DTD in the executive branch (the first real government one was the ATOS DTD from the US Air Force, but that was developed slightly before the SGML standard was confirmed, so we always felt we were pretty close to creating the actual first official DTD in the federal government). Back then we were sending tax publications and instructions to services like CompuServe and BRS, each with their own data formats. We decided to try to adopt structured text technology and single source publishing to make data available in SGML to multiple distribution channels. And this was before the Web.  That specific system has surely been replaced, but it saved time and enabled us to improve our service to taxpayers. We thought the approach was right for many govenrment applications  and should be repeated by other agencies.

So, back to my original point. XML has replaced SGML and is now being used for many government systems including electronic submission of SEC filings, FDA applications, and for the management of many government records. XML has been mentioned as a key technology in the overhaul that is needed in the way the government operates. Obama also plans to create a cabinet level position of CTO, part of the mission of which will be to promote inter-agency cooperation through interchange of content and data between applications formatted in a common taxonomy. He also intends to preserve the open nature of the internet and its content, facilitate publishing important government information and activities on the Web in open formats, and to enhance the national information system infrastructure. Important records are being considered for standardization, such as health and medical records, as well as many other ways we interact with the government. More info on this administration’s technology plan can be found at . Sounds like a job, at least in part, for XML!

I think it is great and essential that our leaders understand the importance of smartly structured data. There is already a lot of XML expertise through the various government offices, as well as a strong spirit of corporation on which we can build. Anyone who has participated in industry schema application development, or other common vocabulary design efforts, knows how hard it is to create a “one-size-fits-all” data model. I was fortunate enough to participate briefly in the development and implementation of SPL, the Standard Product Label (see http://www.fda.gov/oc/datacouncil/spl.html) schema for FDA drug labels which are submitted to the FDA for approval before the drug product can be sold. This is a very well defined document type that has been in use for years. It still took many months and masterful consensus building to finalize this one schema. And it is just one small piece in the much larger information architecture.  It was a lot of effort from many people within and outside the government.  But now it is in place, working and being used.

So, I am bullish on XML in the government these days. It is a mature, well understood, powerful technology with wide adoption, there are many established civilian and defense  examples across the government. I think there is a very big role for XML and related technology in the aggressive, sweeping change promised by this administration. Even so, these things take time. </>

Can Word Processors be used to Create Structured Content?

Today I will address a question I have grappled with for years, can non-structured authoring tools, e.g., word processors, can be used effectively to create structured content? I have been involved for some time in projects for various state legislatures and publishers trying to use familiar word processing tools to create XML content. So far, based on my experiences, I think the answer is a definite “maybe”. Let me explain and offer some rules for your consideration.

First understand that there is a range of validation and control possible in structured editing, from supporting a very loose data model to very strict data models. A loose data model might enforce a vocabulary of element type names but very little in the way of sequence and occurrence rules or data typing that would be required in a strict data model. Also remember that the rules expressed in your data model should be based on your business drivers such as regulatory compliance and internal policy. Therefore:

Rule number 1: The stricter your data model and business requirements are, the more you need a real structured editor. IMHO only very loose data models can effectively be supported in unstructured authoring tools.

Also, unstructured tools use a combination of formatting oriented structured elements and styles to emulate a structured editing experience. Styles tend to be very flat and have limited processing controls that can be applied to them. For instance, a heading style in an unstructured environment usually is applied only to the bold headline which is followed by a new style for the paragraphs that follow. In a structured environment, the heading and paragraphs would have a container element, perhaps chapter, that clearly indicates the boundaries of the chapter. Therefore structured data is less ambiguous than unstructured data. Ambiguity is easier for humans to deal with than computers which like everything explicitly marked up. It is important to know who is going to consume, process, manage, or manipulate the data. If these processes are mostly manual ones, then unstructured tools may be suitable. If you hope to automate a lot of the processing, such as page formatting, transforms to HTML and other formats, or reorganizing the data, then you will quickly find the limitations of unstructured tools. Therefore:

Rule Number 2: Highly automated and streamline processes usually required content to be created in a true structured editor. And very flexible content that is consumed or processed mostly by humans may support the use of unstructured tools.

Finally, the audience for the tools may influence how structured the content creation tools can be. If your user audience includes professional experts, such as legislative attorneys, you may not be able to convince them to use a tool that behaves differently than the word processor they are used to. They need to focus on the intellectual act or writing and how that law might affect other laws. They don’t want to have to think about the editing tool and markup it uses the way some production editors might. It is also good to remember that working under tight deadlines also impacts how much structure can be “managed” by the authors. Therefore:

Rule Number 3: Structured tools may be unsuitable for some users due to the type of writing they perform or the pressures of the environment in which they work.

By the way, a structured editing tool may be an XML structured editor, but it could also be a Web form, application dialog, Wiki, or some other interface that can enforce the rules expressed in the data model. But this is a topic for another day. </>

Why Adding Semantics to Web Data is Difficult

If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:

“Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web.” (http://en.wikipedia.org/wiki/Semantic_Web).

To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.

We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don’t see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow

A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.

Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).

Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content’s meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable “Tower of Babel” with many similar, but not fully interoperable data models.

The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?

This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee’s vision of the semantic Web (see for a recent paper on the subject).
Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>

DataDirect Announces New Release of XML Data Integration Suite

DataDirect Technologies, an operating company of Progress Software Corporation (NASDAQ- PRGS), announced the latest release of the DataDirect Data Integration Suite featuring new versions of its XML-based component technologies for data integration in traditional and service-oriented environments. Designed to meet the data transformation and aggregation needs of developers, the DataDirect Data Integration Suite contains the latest product releases of DataDirect XQuery, DataDirect XML Converters (Java and .NET) and Stylus Studio in one installation. DataDirect XQuery is an XQuery processor that enables developers to access and query XML, relational data, Web services, EDI, legacy, or a combination of data sources. New to version 4.0 is full support for the XQuery Update Facility (XUF), an extension of the XQuery language that allows making changes to data manipulated inside the XQuery. Now developers can more easily update individual XML documents, XML streams, and file collections from within their XQuery applications. The product also includes the ability to update and create Zip files, therefore supporting the OpenOffice XML format. The latest release of the DataDirect XML Converters are compatible with Microsoft BizTalk Server 2006 and are integrated in the Microsoft BizTalk development environment. For healthcare organizations needing to comply with the X12 electronic data interchange (EDI) standards and the latest Health Insurance Portability and Accountability Act (HIPAA) 5010 transaction definitions, the DataDirect XML Converters now include support for the HIPAA EDI dialects including 004010A1, 005010 and 005010A1 messages. Stylus Studio 2009 has a new EDI to XML module that works with DataDirect XML Converters in an interactive way. Users can now load EDI documents to view contents, test conversions, create customizations and preview XML. http://www.datadirect.com

eZ Systems Updates eZ Components

eZ Systems announced the release of eZ Components version 2008.2. This is the seventh major version of eZ Components, which is a general-purpose PHP library of over 40 components used independently or together for PHP application development. The latest versions of eZ Publish are also based on eZ Components. With eZ Components, developers can concentrate on solving customer-specific needs. The eZ Components tool set provides key application functionality, such as caching, authentication, database interaction, templates, graphs, and much more. Main improvements in this release include more features for the Document and Webdav components. The Document component, which enables you to convert documents between different formats, was already able to convert ReST to XTHML and DocBook. In this release, more formats are implemented, such as three different wiki formats (Confluence, Creole and DokuWiki), the eZ Publish XML formats, as well as reading XHTML and writing ReST. The wiki parser can easily be extended for other wiki formats. The Webdav component now supports authentication and authorization, as well as support for integrating authentication mechanisms into existing systems. In addition, it supports shared and exclusive write locks, even with custom storage back-ends. The main new development of the eZ Components 2008.2 release is the MvcTools component. The MvcTools component implements the tools for a framework. Instead of dedicating the structure of the application, it provides a dispatcher, two request parsers (one for HTTP and one for email messages through the existing Mail component), two routing methods, two view handlers (one through plain PHP scripts and one through the Template component), and a response writer for HTTP. http://ezcomponents.org

Mark Cuban on XBRL

While XBRL (eXtensible Business Reporting Language), has been in use on a voluntary basis for awhile, the long slow road to making it a requirement ended this past December with the SEC’s announcement officially mandating it for large public companies (requirements for smaller companies will be phased in). We have argued for years that, as important as XBRL is from a regulatory point of view, its benefit for internal corporate and inter-company financial operations is reason enough to adopt it.

Given the current mess in the financial markets, XBRL has even more potential. Mark Cuban suggests using XBRL to help track the bailout money. Sounds like a great idea, and hopefully others will think of additional uses of this already-existing tool.

Thanks for the tweet Andrew!

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑