Curated content for content, computing, and digital experience professionsals

Day: January 29, 2007

The PDF ISO Standard

Much is being made today of Adobe Systems announcement that “it intends to release the full Portable Document Format (PDF) 1.7 specification to AIIM, the Enterprise Content Management Association, for the purpose of publication by the International Organization for Standardization (ISO).”

The main hubbub surrounds the contention of several bloggers that this represents another attack by Adobe on Microsoft and its recently-released XPS format, “the PDF killer.” Quite probably so. It’s a subject worth examining, although not superficially.
For today I’d like to consider what it means to become an ISO standard. I think of this as the equivalent of getting a lifetime achievement award from The Academy of Motion Picture Arts and Sciences (The Oscars). It means you were pretty good, but you’re now almost dead.

As of December 31, 2005, there were 15,649 published ISO standards, with 1,240 released in that year alone. Under the heading of electronics, information technology and telecommunications, there were 2,447 published standards. How many does your organization conform to? If this impresses you, remember to celebrate World Standards Day on October 14! And for even more fun, there’s the new isomemory game (http://www.iso.org/iso/en/commcentre/isomemory/startpage.html#). I hear it’s fun for the whole family!

You can’t read the published standards on the ISO site without giving them a chunk of cash first. That says something in itself; I’m just not sure what. But you can see listings of the bodies buried in the ISO graveyard. For example ISO 12639:2004 is the TIFF/IT standard, once used widely in the prepress industry, but no longer a player. You can however download it for 176 Swiss francs, 8700 Yugoslav dinars, or about $140 Yankee dollars.

ISO 6804:1991 covers “rubber hoses and hose assemblies for washing-machines and dishwashers — Specification for inlet hoses.” It’s yours for 48 Swiss francs!

I could go on (and am tempted to do so).

At the same time, there are certain relevant standards that have crept into ISO…as Adobe mentions in its press release, all of the PDF sibling are now ISO standards (PDF/X, PDF-X1, etc.). The OpenDocument Format is a standard. And so on.

So what is the significance of becoming an ISO standard when your standard is one that people actually use? Historically, none; more recently, some.

As the publishing industry has evolved into an ever-more-complex microsystem, more and more organizations (and indeed states, countries, etc.) are choosing to endorse standards that have been accepted and published by ISO.

Will more organizations use PDF if it’s an ISO standard? Probably not. That is, unless Microsoft gains real traction with XPS. There are some very high-stakes games being played against the Microsoft/Windows juggernaut, and standards have become a key weapon in the game. Adobe has played a major trump card. Microsoft: your move.

Back to Search Roots for the Enterprise – Structured Search That Is

Structured search (noun) was rooted firmly in the enterprise when publishers of print index resources (e.g. Chemical Abstracts, Index Medicus from the National Library of Medicine, GRA&I from the National Technical Information Service) became available on-line in the early 1970s. The Systems Development Corporation launched ORBIT developed by a team lead by Carlos Cuadra. Orbit was a command driven search tool accessible to professional searchers. In those days searchers were usually special librarians in corporations, large public libraries, government agencies and major universities. Using the ORBIT command language through a terminal connected by a phone line to remote large computers, librarians would type search commands to find data in specific structured fields. These remote computers held electronic versions of paper indices. Citations resulting from a query for specific chemical compounds, diseases, or government reports, would contain information needed to retrieve articles, patents or books from library shelves.

Corporations spent hundreds of thousands of dollars each year to access external specialized, and structured indices, and the journals, conference proceeding, patents and government documents to which the indices pointed. Hard copy (paper or microform) was the only practical way to read content. Computer screens were not accessible to most researchers and even if they had been, content could not be rendered on them in easily readable forms. Also, until computer storage technologies became cheap, indexing large amounts of text (full-text, or unstructured content) was not affordable.

Even with the advent of graphical interfaces, searching for non-specialists made only minor advances in the early-1980s when library systems offered index browsing to find citations. Library users still needed to read content in hard copy. It was only in the late 1980s and early 90s that full-text content began to be searchable by large numbers of library users on CD-ROMs. Users would go to a library computer, which held multiple CD-ROMs containing journals and other subscriptions, and use a menu to find content on the CD-ROMs by typing keywords that would look through all the content to find matches. This was the first routine use of full-text searching by library users.

These technologies are just memories for a few of us, and unknown to most, but they do point to the differentiation between structured and unstructured searching. Both have been around for a couple of decades but it has taken Web search engines to put search in the hands of everyone. Only recently is frustration with retrieving buckets of unfiltered content pushing enterprises to reconfirm the added value of structured searching.

Technical and business users are appreciating the value of being able to search for a precise title, all documents contributed to a specific project, or all presentations delivered by the CEO in the past two years. Each of these searches requires a defined set of data points, stored with the content and retrievable with a search interface that can support the “structured” query.

Yes, librarians have been here before but, just now, the rest of the organization is learning how they managed to get such good search results all along. Structured searching is now a lot simpler than it was in the 1970s. It is only one aspect in enterprise search but it is an important requirement for most enterprise users when they need reliable and clearly defined search results. And, by the way, Carlos is still around building systems for enterprises to manage and search their critical proprietary content.

FAST Introduces Business Intelligence Built on Search

Fast Search & Transfer (OSEAX: FAST.OL) (FAST) unveiled the FAST Adaptive Information Warehouse (AIW), a new approach that lets users capitalize on their entire universe of information to make better informed decisions for competitive advantage. Built on a search platform, FAST AIW integrates an end-to-end framework of products that unifies search and Business Intelligence. FAST AIW puts the Business Intelligence solutions on top of the search platform to integrate and orchestrate all of the information needed to make BI more effective. Users can directly search and navigate Business Intelligence data in an ad-hoc manner, then display relevant, usable information to users without the need for predefined report creation. The FAST AIW platform includes FAST Radar, a Web-based Business Intelligence portal and tool that brings actionable information and statistical analysis to decision-makers throughout the organization by means of a familiar search and navigation interface. FAST Radar provides insights into data through personal, flexible dashboards that move intelligence in the enterprise from IT and business analysts to every business user. Also included is the FAST Data Cleansing Solution, which provides up-to-the-minute access to all information, structured and unstructured, regardless of its source or location. It uses linguistics to improve data quality, enabling organizations to match, merge, and cleanse data automatically. The FAST AIW platform, including FAST Data Cleansing and FAST Radar, is available immediately. FAST Data Cleansing and FAST Radar may also be purchased as individual products. http://www.fastsearch.com

Adobe to Release PDF for Industry Standardization

Adobe Systems Incorporated (Nasdaq:ADBE) announced that it intends to release the full Portable Document Format (PDF) 1.7 specification to AIIM, the Enterprise Content Management Association, for the purpose of publication by the International Organization for Standardization (ISO). PDF has become a de facto global standard since Adobe published the complete PDF specification in 1993. Since 1995 Adobe has participated in various working groups that develop technical specifications for publication by ISO and worked within the ISO process to deliver specialized subsets of PDF as standards for specific industries and functions. Today, PDF for Archive (PDF/A) and PDF for Exchange (PDF/X) are ISO standards, and PDF for Engineering (PDF/E) and PDF for Universal Access (PDF/UA) are proposed standards. Additionally, PDF for Healthcare (PDF/H) is an AIIM proposed Best Practice Guide. AIIM serves as the administrator for PDF/A, PDF/E, PDF/UA and PDF/H. Adobe will release the full PDF 1.7 specification as defined in the PDF Reference Manual to AIIM for the purpose of submission to ISO. The joint committee formed under AIIM will identify issues to be addressed, as well as proposed solutions, and will develop a draft document that will then be presented to a Joint Working Group of ISO for development and approval as an International Standard. AIIM holds the secretariat for the International Organization for Standardization (ISO) Technical Committee (TC) 171 and 171 SC2 for Document Management Applications, and is the administrator for the U.S. Technical Advisory Group to ISO TC 171 that represents the U.S. at international meetings. www.adobe.com/devnet/pdf/pdf_reference.html