Curated for content, computing, and digital experience professionals

Category: Web technologies & information standards (Page 17 of 58)

Here we include topics related to information exchange standards, markup languages, supporting technologies, and industry applications.

Mark Logic Launches Cloud Services

Mark Logic Corporation announced MarkLogic Cloud Services, a new line of services that will make Mark Logic software available on Amazon Web Services. The first such offering in this line is MarkLogic Server for EC2, which enables customers to use MarkLogic on a pay-by-the-hour basis on Amazon EC2, the popular elastic computing cloud platform. MarkLogic Server for EC2 consists of an Amazon Machine Image (AMI) with MarkLogic Server pre-installed. For faster and easier deployment, users can subscribe to the MarkLogic Server AMI directly from Amazon Web Services. This service also allows users to pay for only the resources they need. MarkLogic Server is also now certified on two cloud infrastructures. The first is Amazon EC2, where customers can deploy MarkLogic Server on an infrastructure offered by Amazon Web Services. The second is the VMware virtualization platform, which enables customers to implement clouds on self-managed hardware. http://www.marklogic.com/

Publishing Perspective 2010

By Ted Treanor, Senior Publishing Consultant

Publishing predictions for 2010 abound. As a digital publishing pioneer and visionary, Ted Treanor has been well positioned ahead of the curve, with a unique vantage point to see what’s in store for the industry. At this tipping point, publishing convergence of print and digital has collided with mainstream. Let us know what you think of these predictions.

Let’s see if 13 predictions will be lucky for publishing.

  1. New eReading devices will proliferate. The market is responding like the California gold rush.  Not only will there be new companies launching in 2010, but big electronics firms will have their products. CES will be a haven for digital reading, which will astound everyone.
  2. Pricing experimentation will take center stage.
  3. Digital sales channels both retail and distribution will grow rapidly.
  4. The ePub standard (IDPF.org) will strengthen as an international industry standard. ePub will compete with PDF for the top format for commercial content.
  5. The big surprise this year will be the number of large recognized companies that will strategically target the digital publishing eReading and content space. At least one major communications infrastructure company (possibly wireless) will stake a claim through a publishing partnership. Other prime segments will be computer manufacturers and printer manufactures.
  6. Trade associations will scramble to stay relevant in their attempt to lead members through this time of convergence of print and digital.
  7. Content workflow using XML technologies will become standard for single source production to multiple print and digital editions.
  8. Publishers will attempt to build direct relationships with their reader customers…not very successfully in 2010.
  9. Technology and services companies will further enable authors for self-publishing and in their sales goals. At least one big name author will experiment in self-publishing in 2010.
  10. eCatalogs will become a standard tool in selling content to booksellers, librarians, etc..
  11. Digital galleys will gain in popularity.
  12. E-content will be grafted into print in innovative ways.
  13. New ebook data reports and ebook directories will become ‘must-have’ resources. Gilbane Group has a series of three publishing transformation reports planned in 2010.

Follow me on Twitter @ ePubDr

Focusing on the “Content” in Content Management

The growth in web-centric communication has created a major focus on content management, web content management , component content management, and so on. This interest is driven primarily by increasing demand for rich, interactive, accessible information products delivered via the Web. The focus is not misplaced but may be missing part of the point. To be specific, in our focus on the “management” part of CM, we may be missing the first word in the phrase…. “Content.”

It’s true that the application of increasing amounts of computer and brain power to the processes associated with preparing and delivering the kind of information demanded by today’s users can improve those products. But it does so within limits set by and at costs generated by the content “raw material” it gets from the content providers. In many cases, the content available to web product development processes is so structurally crude that it requries major clean-up and enhancement in order to adequately participate in the classification and delivery process. As the focus on elegant Web delivery increases, barring real changes in the condition of this raw content, the cost of enhancement is likely to grow proportionally, straining the involved organizations’ ability to support it.

The answer may be in an increased focus on the processes and tools used to create the original content. We know that the original creator of most content knows the most about how it should be logically structured and most about the best way to classify it for search and retrieval. Trouble is, in most cases, we provide no means of capturing what the creator knows about his or her intellectual product. Moreover, because many creators have never been able to fully populate the metadata needed to classify and deliver their content, in past eras, professional catalogers were employed to complete this final step. In today’s world, however, we have virtually eliminated the cataloger, assuming instead that the prodigious computer power available to us could develop the needed classification and structure from the content itself. That approach can and does work, but it will require better raw material if it is to achieve the level of effectiveness needed to keep the Web from becoming a virtual haystack in which finding the needle is more good luck than good measure. Native XML editors instead of today’s visually oriented word processors, spreadsheets, graphics and other media forms with content-specific XML under them, increased use of native XML databases and a host of rich content-centric resources are part of this content evolution.

Most important, however, may be promulgation of the realization across society that creating content includes more than just making it look good on the screen, and that the creator shares in that responsibility. This won’t be an easy or quick process, requiring more likely generations than years, but if we don’t begin soon, we may end up with a Web 3 or 4 or 5.0 trying to deliver content that isn’t even yet 1.0.

Syncro Soft Updates Oxygen XML Editor and XML Author

Syncro Soft Ltd announced the immediate availability of version 11.1 of its XML Editor and XML Author. Oxygen combines content author features like the CSS driven Visual XML editor with a fully featured XML development environment. It has ready-to-use support for the main document frameworks DITA, DocBook, TEI and XHTML and also includes support for all XML Schema languages, XSLT/XQuery Debuggers, WSDL analyzer, XML Databases, XML Diff and Merge, Subversion client and more. Version 11.1 of <oXygen/> XML Editor improves the XML authoring capabilities, the support for XML development and also a number of core features. The visual XML authoring now uses schema information to provide intelligent editing actions that help keep the document valid and provide a better editing experience. The new compact representation of tags and the quick up/down navigation features improve the ergonomics and the usability. <oXygen/> can use any XQJ compliant XQuery processor for XQuery transformations, different error levels and external references can be specified for Schematron messages and the XProc support was improved with better editing and execution. The XML format and indent operation can use DTD/schema information to provide better formatting and the find and replace is now XML-aware and can accept XPath filtering to delimit the search scope. Starting with version 11.1 the diff and merge support from oXygen is available also as a separate application, oXygen XML Diff. Oxygen XML Editor and XSLT Debugger is available immediately in three editions: Multi-platform Academic/Personal license costs USD 64.00 (includes the one year support and maintenance pack). Multi-platform Professional license costs USD 349.00; Multi-platform Enterprise license costs USD 449.00. Oxygen XML Author is available immediately in two editions: Multi-platform Professional license costs USD 199.00; Multi-platform Enterprise license costs USD 269.00. http://www.oxygenxml.comhttp://www.syncrosvnclient.com

W3C Publishes Drafts of XQuery 1.1, XPath 2.1

The World Wide Web Consortium (W3C) has published new Drafts of XQuery 1.1, XPath 2.1 and Supporting Documents. As part of work on XSLT 2.1 and XQuery 1.1, the XQuery and XSL Working Groups have published First Public Working Drafts of “XQuery and XPath Data Model 1.1,” “XPath and XQuery Functions and Operators 1.1,” “XSLT and XQuery Serialization 1.1” and “XPath 2.1.” In addition, the XQuery Working Group has updated drafts for “XQuery 1.1: An XML Query Language,” “XQueryX 1.1” and “XQuery 1.1 Requirements.” http://www.w3.org/News/2009#entry-8682

W3C XML Schema Definition Language (XSD) 1.1 Last Call Draft Published

The W3C (World Wide Web Consortium) XML Schema Working Group has published Last Call Working Draft of “W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures” and “Part 2: Datatypes. The former specifies the XML Schema Definition Language, which offers facilities for describing the structure and constraining the contents of XML documents, including those which exploit the XML Namespace facility. The schema language, which is itself represented in an XML vocabulary and uses namespaces, substantially reconstructs and considerably extends the capabilities found in XML document type definitions (DTDs). The second publication defines facilities for defining datatypes to be used in XML Schemas as well as other XML specifications. Comments are welcome through 31 December. Learn more about the Extensible Markup Language (XML) Activity. http://www.w3.org/TR/2009/WD-xmlschema11-1-20091203/

Inmedius Releases iConvert for Conversion of Complex Technical Documents

Inmedius, Inc. announced the general release of iConvert, a comprehensive environment for the conversion of documents into structured eXtensible Markup Language (XML). The software supports conversion from legacy paper, Microsoft Word or PDF files. iConvert also comes pre-configured for XML conversion of original S1000D, 40051B and ATA documents, and supports any Document Type Definition (DTD) or XML schema. iConvert synchronizes the original document with the converted XML document in a multi-pane, on-screen display. This approach to XML conversion should allow for the continuous fine-tuning of document conversion rules for increased automated transfer. iConvert’s has modified its user environment and workflow design that guides the user through the XML conversion process. At the same time, iConvert provides a visual inspection of the original document that is synchronized with the configured XML output. During this step, the end-user should be able to drag and drop both unconverted pieces of data, as well as content that has been transformed properly. User defined rules files applied to create the original conversion are updated, allowing for a second pass with increased accuracy. http://inmedius.com/

What is Smart Content?

At Gilbane we talk of “Smart Content,” “Structured Content,” and “Unstructured Content.” We will be discussing these ideas in a seminar entitled “Managing Smart Content” at the Gilbane Conference next week in Boston. Below I share some ideas about these types of content and what they enable and require in terms of processes and systems.

When you add meaning to content you make it “smart” enough for computers to do some interesting things. Organizing, searching, processing, and discovery are greatly improved, which also increases the value of the data. Structured content allows some, but fewer, processes to be automated or simplified, and unstructured content enables very little to be streamlined and requires the most ongoing human intervention.

Most content is not very smart. In fact, most content is unstructured and usually more difficult to process automatically. Think flat text files, HTML without all the end tags, etc. Unstructured content is more difficult for computers to interpret and understand than structured content due to incompleteness and ambiguity inherent in the content. Unstructured content usually requires humans to decipher the structure and the meaning, or even to apply formatting for display rendering.

The next level up toward smart content is structured content. This includes wellformed XML documents, content compliant to a schema, or even RDMS databases. Some of the intelligence is included in the content, such as boundaries of element (or field) being clearly demarcated, and element names that mean something to users and systems that consume the information. Automatic processing of structured content includes reorganizing, breaking into components, rendering for print or display, and other processes streamlined by the structured content data models in use.

Smart Content diagram

Finally, smart content is structured content that also includes the semantic meaning of the information. The semantics can be in a variety of forms such as RDFa attributes applied to structured elements, or even semantically names elements. However it is done, the meaning is available to both humans and computers to process.

Smart content enables highly reusable content components and powerful automated dynamic document assembly. Searching can be enhanced with the inclusion of metadata and buried semantics in the content providing more clues as to what the data is about, where it came from, and how it is related to other content. Smart content enables very robust, valuable content ecosystems.

Deciding which level of rigor is needed for a specific set of content requires understanding the business drivers intended to be met. The more structure and intelligence you add to content, the more complicated and expensive the system development and content creation and management processes may become. More intelligence requires more investment, but may be justified through benefits achieved.

I think it is useful if the XML and content management (CMS) communities use consistent terms when talking about the rigor of their data models and the benefits they hope to achieve with them. Hopefully, these three terms, smart content, structured content, and unstructured content ring true and can be used productively to differentiate content and application types.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑