The HTML Working Group has published a Proposed Recommendation of “HTML5.” This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability. Comments are welcome through 14 October. Learn more about the HTML Activity.Read More
The W3C announced today that the HTML5 definition is complete, and on schedule to be finalized in 2014. This is excellent news for the future of the open Web, that is, all of us. If you were involved in discussions about mobile development strategies at our recent conference you’ll want to check out all the details at http://dev.w3.org/html5/decision-policy/html5-2014-plan.
Moving right along, the HTML Working Group also published the first draft of HTML 5.1 so you can see a little further down the road for planning purposes. See http://www.w3.org/TR/2012/WD-html51-20121217/.
From the W3C newsletter…
W3C published today the complete definition of the “HTML5″ and “Canvas 2D” specifications. Though not yet W3C standards, these specifications are now feature complete, meaning businesses and developers have a stable target for implementation and planning. “As of today, businesses know what they can rely on for HTML5 in the coming years, and what their customers will demand,” said Jeff Jaffe, W3C CEO. HTML5 is the cornerstone of the Open Web Platform, a full programming environment for cross-platform applications with access to device capabilities; video and animations; graphics; style, typography, and other tools for digital publishing; extensive network capabilities; and more.
To reduce browser fragmentation and extend implementations to the full range of tools that consume and produce HTML, W3C now embarks on the stage of W3C standardization devoted to interoperability and testing. W3C is on schedule to finalize the HTML5 standard in 2014. In parallel, the W3C community will continue its work on next generation HTML features, including extensions to complement built-in HTML5 accessibility, responsive images, and adaptive streaming.Read More
We’ve published a new paper on addressing large-scale integration, storage, and access of complex information. As Dale mentions in his entry over on our main blog, the paper frames the discussion in terms of challenges to Open Government initiatives. We note, though, that the exploration of obstacles to effective, efficient processing of high volumes of data and content is relevant across many industries.
We’re cross-posting here on the XML blog because the paper deals wtih XML content and the XML family of standards, including XQuery and XPath.
The Gilbane Beacon is available as a free download from Gilbane and from Mark Logic, sponsor of the paper.Read More
If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:
“Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web.” (http://en.wikipedia.org/wiki/Semantic_Web).
To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.
We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don’t see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow
A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.
Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).
Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content’s meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable “Tower of Babel” with many similar, but not fully interoperable data models.
The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?
This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee’s vision of the semantic Web (see for a recent paper on the subject).
Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>
Looking ahead to our conference in San Francisco, there are a number of sessions related to XML and content management, as well as some broader sessions on SaaS and content management platforms. David Guenette and I are working with Frank on the Content Technologies & Strategies (CTS) track as well as the Enterprise Publishing Technology (EPT) track. At this writing, we have the following sessions on tap (and you can see the whole grid here).
CTS-1: XML Strategies for Content Management
XML is fundamental to content management in two important ways–in how the content is tagged and structured and also in how content management systems interact with each other and with other enterprise applications. This session looks at how successful organizations make the best use of XML to support critical business processes and applications.
CTS-2: Enterprise Rights Management: Best Practices & Case Studies
As content management systems proliferate, so do the requirements for better and more sophisticated protection of that content. Simply stated, traditional protection is not enough–content needs to be protected persistently throughout complex business processes. Enterprise Rights Management platforms are answering these challenges, and this session uses case studies to help explain how this technology can help you meet your requirements.
CTS-3: SaaS – Is Software as a Service Right for You?
Software as a Service is exploding. Every day brings new offerings, new approaches, and new adopters. While content management SaaS offerings were once limited to Web Content Management, there are now SaaS offerings for document management, ECM, globalization, and XML-based component content management. This session looks at the big questions about SaaS and discusses whether SaaS might be right for you.
CTS-4: Platform Pros & Cons: SharePoint vs. Oracle vs.
Documentum vs. IBM
The long-predicted content management platform wars are upon us. Activity is everywhere–the introduction of SharePoint 2007, Oracle’s acquisition of Stellent, and EMC’s continued aggressive acquisition strategy, and IBM’s acquisition of Filenet. Will we all end up using one of these four platforms, and if we do, would this be a good thing? This session will offer the vendor, user, and industry perspective on this dominant issue.
CTS-5: Financial Content Collaboration with XBRL & RIXML
If you follow XML in the financial services arena, you undoubtedly know about XBRL, the emerging standard for financial data reporting that is really taking hold at the SEC and the regulatory agencies of EU countries. But a lesser known but equally intriguing standard is RIXML, the Research Information Exchange Markup Language. This session looks at these standards and the implications for the lifecycle of financial content.
EPT-1: Enterprise Publishing with XML (DITA)
June 2008 marks the third anniversary since DITA 1.0 was approved by the OASIS Technical Committee, and it is very safe to say that no XML-based publishing standard has had such rapid and far-ranging uptake. This session looks at some emerging uses of DITA while also discussing some of the positive business impact enjoyed by companies who have already adopted the standard.
EPT-2: Multi-Channel Publishing – How to Do It
Multi-channel publishing has become a mandate for nearly every organization. With the explosion in mobile devices, the mandate is becoming more complex. But along with this complexity comes opportunity to serve more users and more applications. This session offer case studies and practical advice for implementing multi-channel publishing to support your business objectives.
EPT-3: Digital Publishing Platforms: Magazines, Newspapers &eBooks
Amazon’s Kindle may be getting all of the publicity, but there is an explosion in new devices, technologies, and products for digital publishing–with implications for every traditional publishing medium. What are these new technologies, and what opportunities do they present to publishers? Hear from publishers and technologists, as well as some of the results of the Gilbane Group’s extensive research into how these technologies are reshaping the digital publishing landscape.Read More
The World Wide Web Consortium (W3C) released the XHTML 1.0 specification as a W3C Recommendation. This new specification represents cross-industry and expert community agreement on the importance of XHTML 1.0 as a bridge to the Web of the future. A W3C Recommendation indicates that a specification is stable, contributes to Web interoperability, and has been reviewed by the W3C membership who favors its adoption by the industry. HTML currently serves as the lingua franca for millions of people publishing hypertext on the Web. While that is the case today, the future of the Web is written in XML. XML is bringing the Web forward as an environment that better meets the needs of all its participants, allowing content creators to make structured data that can be easily processed and transformed to meet the varied needs of users and their devices. In designing XHTML 1.0, the W3C HTML Working Group faced a number of challenges, including one capable of making or breaking the Web: how to design the next generation language for Web documents without obsoleting what’s already on the Web, and how to create a markup language that supports device-independence. The answer was to take HTML 4, and rewrite it as an XML application. The first result is XHTML 1.0. XHTML 1.0 allows authors to create Web documents that work with current HTML browsers and that may be processed by XML-enabled software as well. Authors writing XHTML use the well-known elements of HTML 4 (to mark up paragraphs, links, tables, lists, etc.), but with XML syntax, which promotes markup conformance. The benefits of XML syntax include extensibility and modularity. With HTML, authors had a fixed set of elements to use, with no variation. With XHTML 1.0, authors can mix and match known HTML 4 elements with elements from other XML languages, including those developed by W3C for multimedia (Synchronized Multimedia Language – SMIL), mathematical expressions (MathML), two dimensional vector graphics (Scalable Vector Graphics – SVG), and metadata (Resource Description Framework – RDF). W3C provides instruction and tools for making the transition from HTML 4 to XHTML 1.0. The “HTML Compatibility Guidelines” section of the XHTML 1.0 Recommendation explains how to write XHTML 1.0 that will work with nearly all current HTML browsers. W3C offers validation services for both HTML and XHTML documents. W3C’s Open Source software “Tidy” helps Web authors convert ordinary HTML 4 into XHTML and clean document markup at the same time. www.w3.org/Read More