Curated for content, computing, and digital experience professionals

Month: January 2009 (Page 4 of 5)

Why Adding Semantics to Web Data is Difficult

If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:

“Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web.” (http://en.wikipedia.org/wiki/Semantic_Web).

To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.

We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don’t see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow

A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.

Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).

Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content’s meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable “Tower of Babel” with many similar, but not fully interoperable data models.

The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?

This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee’s vision of the semantic Web (see for a recent paper on the subject).
Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>

Meet Gilbane: SDL GIM Chicago

We travel to the Windy City on January 21 for the next event in SDL‘s series on global information management. Speakers from Fair Isaac and Garmin will share their experiences with creating, translating, managing and publishing multilingual content. Gilbane’s kick-off presentation looks at trends and best practices emerging from our research on how companies are aligning multilingual content practices with business goals and objectives.
Registration is open to anyone with an interest in managing content for global audiences.

IBM Delivers New “Social” Lotus Notes and Free Symphony Software for Macs

IBM (NYSE~IBM) announced the availability of Lotus Notes 8.5 collaboration software with social computing features for all Mac OS X Leopard-powered computers. In addition, IBM’s free Lotus Symphony document, spreadsheet and presentation software will be available later this month for the Mac. Lotus Notes 8.5 provides significant storage savings over previous versions. Notes has an intelligent storage savings feature that ensures that only one copy of an attachment is kept on the mail server, resulting in an estimated 40 percent space savings. Lotus Notes 8.5 arranges all collaboration tools on one screen in fewer clicks. This screen shows links to team rooms, instant messaging, to do lists, calendar, Internet browsers and other tools. Social characteristics include new integration with Google, Yahoo, and hundreds of other public Internet calendars. IBM also announced new Lotus iNotes 8.5 software, which allows anyone with a Notes user license to access Notes through a Safari browser from anywhere. iNotes allows the user to integrate the Notes calendar with Google calendar and also supports most standard widgets. One example of a widget is the mapping of a street address in an e-mail note. IBM sells Lotus Notes and Domino in a variety of ways, including packaged with hardware for small and medium businesses; via a hosted service, where the software is stored on a server at IBM; and through Passport Advantage on http://www.ibm.com/lotus/notesanddomino.

DataDirect Announces New Release of XML Data Integration Suite

DataDirect Technologies, an operating company of Progress Software Corporation (NASDAQ- PRGS), announced the latest release of the DataDirect Data Integration Suite featuring new versions of its XML-based component technologies for data integration in traditional and service-oriented environments. Designed to meet the data transformation and aggregation needs of developers, the DataDirect Data Integration Suite contains the latest product releases of DataDirect XQuery, DataDirect XML Converters (Java and .NET) and Stylus Studio in one installation. DataDirect XQuery is an XQuery processor that enables developers to access and query XML, relational data, Web services, EDI, legacy, or a combination of data sources. New to version 4.0 is full support for the XQuery Update Facility (XUF), an extension of the XQuery language that allows making changes to data manipulated inside the XQuery. Now developers can more easily update individual XML documents, XML streams, and file collections from within their XQuery applications. The product also includes the ability to update and create Zip files, therefore supporting the OpenOffice XML format. The latest release of the DataDirect XML Converters are compatible with Microsoft BizTalk Server 2006 and are integrated in the Microsoft BizTalk development environment. For healthcare organizations needing to comply with the X12 electronic data interchange (EDI) standards and the latest Health Insurance Portability and Accountability Act (HIPAA) 5010 transaction definitions, the DataDirect XML Converters now include support for the HIPAA EDI dialects including 004010A1, 005010 and 005010A1 messages. Stylus Studio 2009 has a new EDI to XML module that works with DataDirect XML Converters in an interactive way. Users can now load EDI documents to view contents, test conversions, create customizations and preview XML. http://www.datadirect.com

What Does an Analyst Do for You?

Among the roles that I have chosen for myself as Lead Analyst for Enterprise Search at the Gilbane Group is to evaluate, in broad strokes, the search marketplace for internal use at enterprises of all types. My principal audience is those within enterprises that may be involved in the selection, procurement, implementation and deployment of search technology to benefit their organizations. In this role, I am an advocate for buyers. However, when vendors pay attention to what I write it should help them understand the buyer’s perspective. Ultimately, good vendors incorporate analyst guidance into their thinking about how to serve their customer better.

We do not hide the fact that, as industry analysts, we also consult to various content software companies. When doing so, I try to keep in mind that the market will be served best when I honestly advocate for software and service improvements that will benefit buyers. This is a value to those who sell and those who buy software. My consulting to vendors indirectly benefits both audiences.

Analysts also consult to buyers, to help them make informed decisions about technology decisions and business relationships. I particularly enjoy and value those experiences because what I learn about enterprise buyers’ needs and expectations can translate directly into advice to vendors. This is an honest brokering role that comes naturally because I have been a software vendor and also in a position to make many software procurement decisions, particularly tools and applications that were used by my development and service teams. I’m always enthusiastic to be in a position to share important information about products with buyers and information about buying audiences with those who build products. This can be done effectively while preserving confidentiality on both sides and making sure that everyone gets something out of the communications.

As an analyst, I receive a lot of requests by vendors to listen to, by phone and Web, briefings on their products, or to meet, one-on-one with their executives. You may have noticed that I don’t write reviews of specific products although, in a particular context, I may reference products and applications. While we understand the reason that product vendors want analysts to pay attention to them, I don’t find briefings particularly enlightening unless I know nothing about a company and its offerings. For these types of overviews, I can usually find what I want to know on their Web site, in press releases and by poking around the Web. During briefings I want to drive the conversation toward user experiences and needs.

What I do like to do is talk to product users about their experiences with a vendor or a product. I like to know what the implementation and adoption experience is like and how their organization had been affected by product use, both benefits and drawbacks. It is not always easy to gain access to customers but I have ways of finding them and also encourage readers of this blog to reach out with your stories. I am delighted to learn more through comments to the blog, an email or phone call. If you are willing to chat with me for a while, I will call you at your convenience.

The original topic I planned to write about this week will have to wait because, after receiving over 20 invitations to “be briefed” in the past few days, I decided it was more important to let readers know who I want to be briefed by – search technology users are my number one target. Vendors please push your customers in this direction if you want me to pay attention. This can bring you a lot of value, too. It is a matter of trust.

eZ Systems Updates eZ Components

eZ Systems announced the release of eZ Components version 2008.2. This is the seventh major version of eZ Components, which is a general-purpose PHP library of over 40 components used independently or together for PHP application development. The latest versions of eZ Publish are also based on eZ Components. With eZ Components, developers can concentrate on solving customer-specific needs. The eZ Components tool set provides key application functionality, such as caching, authentication, database interaction, templates, graphs, and much more. Main improvements in this release include more features for the Document and Webdav components. The Document component, which enables you to convert documents between different formats, was already able to convert ReST to XTHML and DocBook. In this release, more formats are implemented, such as three different wiki formats (Confluence, Creole and DokuWiki), the eZ Publish XML formats, as well as reading XHTML and writing ReST. The wiki parser can easily be extended for other wiki formats. The Webdav component now supports authentication and authorization, as well as support for integrating authentication mechanisms into existing systems. In addition, it supports shared and exclusive write locks, even with custom storage back-ends. The main new development of the eZ Components 2008.2 release is the MvcTools component. The MvcTools component implements the tools for a framework. Instead of dedicating the structure of the application, it provides a dispatcher, two request parsers (one for HTTP and one for email messages through the existing Mail component), two routing methods, two view handlers (one through plain PHP scripts and one through the Template component), and a response writer for HTTP. http://ezcomponents.org

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑