Just spent an excessive number of hours perusing the XML and related community groups on various Web community sites. There are several social community tools and sites, but even just LinkedIn (http://www.linkedin.com/). seems to have dozens of community groups that at least mention XML in their descriptions, and several have it prominent in their names and logos. I joined several. Let's see what happens. Stay tuned... </>
February 2009 Archives
As part of our Gilbane Onsite Technology Strategy Workshop Series, we are happy to announce a new workshop, Implementing DITA.
Course Description
DITA, the Darwin Information Typing Architecture is an emerging standard for content creation, management, and distribution. How does DITA differ from other XML applications? Will it work for my vertical industry’s content? From technical documentation, to training manuals, from scientific papers to statutory publishing. DITA addresses one of the most challenging aspects of XML implementation, developing a data model that can be user and shared with information partners. Even so, DITA implementation requires effective process, software, and content management strategies to achieve the benefits promised by the DITA business case, cost-effective, reusable content. This seminar will familiarize you with DITA concepts and terminology, describe business benefits, implementation challenges, and best practices for adopting DITA. How DITA enables key business processes will be explored, including content management, formatting & publishing, multi-lingual localization, and reusable open content. Attendees will be able to participate in developing an effective DITA content management strategy.
Audience
This is an introductory course suitable for anyone looking to better understand DITA standard, terminology, processes, benefits, and best practices. A basic understanding of computer processing applications and production processes is helpful. Familiarity with XML concepts and publishing helpful, but not required. No programming experience required.
Topics Covered
The Business Drivers for DITA Adoption
DITA Concepts and Terminology
The DITA Content Model
Organizing Content with DITA Maps
Processing, Storing & Publishing DITA Content
DITA Creation, Management & Processing Tools
Multi-lingual Publishing with DITA
Extending DITA to work with Other Data Standards
Best Practices & Pitfalls for DITA Implementation
For more information and to customize a workshop just for your organization, please contact Ralph Marto by email or at +617.497.9443 x117
If you don't know Project Guttenberg--and you should--it's well worth spending your time over there to familiarize yourself with its contents and the way it has gone about creating the collection.
I keep track of it through the RSS feed of recently added books, which is updated nightly. That's where I find out about new books like, The Pecan and its Culture, published in 1906, which includes the photo shown at left.
On their own, the one image and the one title are perhaps not so interesting or so significant (though I for one love these little snapshots of Americana, especially such primary material). What is significant of course is the mass nature of the digitization, and the care in which it is undertaken. I compare this care with the sometimes abysmal scanning work being done by Google (and with much more fanfare). The fruits of Project Guttenberg are much more openly available, much easier to access, and much easier to migrate to reading devices like the Kindle.
So as we look at all the eBook and digitization efforts underway today, let's not forget Project Guttenberg.
O'Reilly's Tools of Change conference in New York City this week was highly successful, both inside and outside the walls of the Marriott Marquis. The sessions were energetic, well-attended, and--on the whole--full of excellent insight and ideas about the digital trends taking a firm hold of nearly all sectors of the publishing business. Outside the walls, especially on Twitter, online communities were humming with news and commentary on the the conference. (You almost could have followed the entire conference just by following the #toc hash tag at Twitter and accessing the online copies of the presentations.)
But if you had done that, you would have missed the fun of being there. There were some superb keynotes and some excellent general sessions. Notable among the keynotes were Tim O'Reilly himself, Neelan Choksi from Lexcycle (Stanza), and Cory Doctorow. The general sessions covered a fairly broad spectrum of topics but were heavy on eBooks and community. Because of my own and my clients' interests, I spent most of my time in the eBook sessions. The session eBooks I: Business Models and Strategy was content-rich. To begin with, you heard straight from senior people at major publishers with significant eBook efforts (Kenneth Brooks from Cengage Learning, Leslie Hulse from Harper Collins Publishers, and Cynthia Cleto from Springer Science+Business Media). Along with their insight, the speakers--and moderator Michael Smith from IDPF--assembled an incredibly valuable wiki of eBook business and technical material to back up their talk. I also really enjoyed a talk from Gavin Bell of Nature, The Long Tail Needs Community, where he made a number of thoughtful points about how publishers need to think longer and harder about how reading engages and changes people and specifically how a publisher can build community around those changes and activities.
There were a few soft spotsin the schedule. Jeff Jarvis' keynote, What Would Google do with Publishing?, was more about plumping his new book (What Would Google Do?) than anything else, but was also weirdly out of date, even though the book is hot off the presses, with 20th century points like "The link changes everything" and "If you're not searchable, you won't be found." (Publishers are often, somewhat unfairly, accused of being Luddite, but they are not that Luddite.) There were also a couple of technical speakers who didn't seem to make the necessary business connections to the technical points they were making, which would have been helpful to those members of the audience who were less technical and more publishing-product and -process oriented. But these small weaknesses were easily outshone by the many high points, the terrific overall energy, and the clear enthusiasm of the attendees.
One question I have for the O'Reilly folks is to ask how they will keep the energy going. They have a nascent Tools of Change community site. Perhaps they could enlist some paid community managers to seed and moderate conversations, and also tie community activities to other O'Reilly products such as the books and other live and online events.
O'Reilly has very quickly established a very strong conference and an equally strong brand around the conference. With the publishing industry so engulfed in digital change now, I have to think this kind of conference and community can only continue to grow.
Yesterday the big stimulus bill cleared the conference committee that resolves the Senate and House versions. If you remember your civics that means it will be likely to pass in the chambers and then be signed into law by the president.
Included in the bill are billions of dollars for digitizing important information such as medical records or government information. Wow! That is a lot of investment! The thinking is that inaccessible information locked in paper or proprietary formats cost us billions each year in productivity. Wow! That's a lot of waste! Also, that access to the information could spawn a billions of dollars of new products and services, and therefore income and tax revenue. Wow! That's a lot of growth!
Many agencies and offices have striven to expose useful official information and reports at the federal and state level. Even so, there is a lot of data still locked away, or incomplete or in difficult to use forms. A while ago a Senate official once told me that they do not maintain a single, complete, accurate, official copy of the US Statutes internally. Even if this is no longer true, the public often relies on the "trusted" versions that are available only through paid online services. Many other data types, like many medical records, only exist in paper.
There are a lot of challenges, such as security and privacy issues, even intellectual property rights issues. But there are a lot of opportunities too. There are thousands of data sources that could be tapped into that are currently locked in paper or proprietary formats.
I don't think the benefits will come at the expense of commercial services already selling this publicly owned information as some may fear. These online sites provide a service, often emphasizing timeliness or value adds like integrating useful data from different sources, in exchange for their fees. I think a combination of free government open data resources and delivery tools, plus innovative commercial products will emerge. Maybe some easily obtained data may become commoditized, but new ways of accessing and integrating information will emerge. The big information services probably have more to fear from startups than from free government applications and data.
As it happens, I saw a demo yesterday of a tool that took all the activity of a state legislature and unified it under one portal. This allows people to track a bill and all related activity in a single place. For free! The bill working its way through both chambers is connected to related hearing agendas and minutes, which are connected to schedules, with status and other information captured in a concise dashboard-like screen format (there are other services you can pay for which fund the site). Each information component came from a different office and was originally in it's own specialized format. What we were really looking at was a custom data integration application done with AJAX technology integrating heterogeneous data in a unified view. Very powerful, and yet scalable. The key to its success was strong integration of data, the connections that were used to tie the information together. The vendor collected and filtered the data, converted to a common format, added the linkage and relationship information to provide an integrated view into data. All source data is stored separately and maintained by different offices. Five years ago it would have been a lot more difficult to create the service. Technology has advanced, and the data are increasingly available in manageable forms.
The government produces a lot of information that affect us daily that we, as taxpayers and citizens, actually own, but have limited or no access to. These include statutes and regulations, court cases, census data, scientific data and research, agricultural reports, SEC filings, FDA drug information, taxpayer publications, forms, patent information, health guidelines, etc., etc., etc. The list is really long. I am not even scratching the surface! It also includes more interactive and real-time data, such as geological and water data, whether information, and the status of regulation and legislation changes (like reporting on the progress of the stimulus bill as it worked it way through both chambers). All of these can be made more current, expanded for more coverage, integrated with related materials, validated for accuracy. There are also new opportunities to open up the process of using forums and social media tools for collecting feedback from constituents and experts (like the demo mentioned above). Social media tools may both give people an avenue to express their ideas to their elected officials, as well as be a collection tool to gather raw data that can be analyzed for trends and statistics, which in turn becomes new government data that we can use.
IMHO, this investment in open government data is a powerful catalyst that could actually create or change many jobs or business models. If done well, it could provide significant positive returns, streamline government, open access to more information, and enable new and interesting products and applications. </>
An old colleague of mine from more than a dozen years ago found me on LinkedIn today. And within five minutes we got caught up after a gap of several years. I know, reestablishing lost connections happens all the time on social media sites. I just get a kick out of it every time it happens. But this is the XML blog, not the social media one, so...
My colleague works at a company that has been using SGML & XML technology for more than a 15 years. Their data is still in SGML. They feel they can always export to XML and do not plan to migrate their content and applications to SGML any time soon. The funny thing was that he was slightly embarrassed about still being in SGML.
Wait a minute! There is no reason to think SGML is dead and has to be replaced. Not in general. Maybe for specific applications a business case supports the upgrade, but it doesn't have to every time. Not yet.
I know of several organizations that still manage data in the SGML they developed years ago. Early adopters, like several big publishers, some state and federal government applications, and financial systems were developed when there was only one choice. SGML, like XML, is a structured format. They are very, very similar. One format can be used to create the other very easily. They already sunk their investment into developing the SGML system and data, as well as training their users in it's use. The incremental benefits of moving to XML do not support the costs of the migration. Not yet.
This brings up my main point, that structured data can be managed in many forms. These include XML, SGML, XHTML, databases, and probably other forms. The data may be structured, follow rules for hierarchy, occurrence and data typing, etc. but not be managed as XML, only exported as XML when needed. My personal opinion is that XML stored in databases provides some of the best combination of structured content management features, but different business needs suggest a variety of approaches may be suitable. Flat files stored in folders and formatted in old school SGML might still be enough and not warrant migration. Then again, it depends on the environment and the business objectives.
When XML first came out, someone coined the phrase that SGML stood for "Sounds Good, Maybe Later" because it was more expensive and difficult to implement. XML is more Web aware and is somewhat more clearly defined and therefore tools operate more consistently. Many organizations that felt SGML could not be justified were able to later justify migrating to XML. Others migrated right away to take advantage of the new tools or related standards. XML does eliminate some features of SGML that never seemed to work right too. It also demands Wellformed data, which reduces ambiguity and simplifies a few things. And tools have come a long way and are much more numerous, as expected.
XML is definitely more successful in terms of number and range of applications and XML adoption is an easier case to make today than SGML was back in the day. But many existing SGML applications still have legs. I would not suggest that a new application start off with SGML today, but I might modify the old saying to "Sounds Good, Migrate Later".
So, when is it a good idea to migrate from SGML to XML? There are many tools available that do things with XML data better than they do with other structured forms. Many XML tools support SGML as well, but DBMS systems now can managed content as XML data type and use XML XPath nodes in processing. WIKIs and other tools can produce XML content and utilize other standards based on XML, but not SGML that I am aware of. If you want to take advantage of features of Web or XML tools, you might want to start planning your migration. But if your system is operational and stable, the benefits might not yet justify the investment and disruption from migrating. Not yet! </>
UK-based publishing consultant Paul Coyne asked a good question on LinkedIn: Can e-books ever support a secondary (second-hand) market?
I love books. And eBooks. However, many of my books are second hand from booksellers, car-boot sales and friends. How important is this secondary market to books and can ebooks ever really go mainstream without a secondary market? BTW I have no clue how this would work!
I offered the following thoughts...
Great question. The secondary market is incredibly important to the buyer of course, and perhaps a blessing and a curse to the publisher--a blessing because it creates more value in the buyer's mind and a curse because it slows and eliminates some sales in markets like college and school book publishing.
One of the great ongoing questions about eBooks is price point. There is a growing feeling they should be very inexpensive compared to their print counterparts, both because of the perception they are less costly to produce and the reality that there is no current secondary market. Thus you see Amazon trying to get all Kindle books under $10 (US).
I still like the idea of superdistribution for digital products. By my crude definition (some authoritative links in a moment), a buyer of an eBook would be able to pass along the eBook and gain something from the eventual use of it by another user. Think of it as me getting a small commission when someone I pass it along to ends up buying it. I guess you could also think about it as a kind of viral sales model.
See also:
A decent Wikipedia entry on superdistribution.
An old but well written Wired magazine article on superdistribution.
We covered this in a DRM book I cowrote with Bill Rosenblatt and Steve Mooney.
There is a new feed URL you should use for this blog - http://gilbane.com/xml/atom.xml. It is not actually new, but it will be the feed we maintain going forward. The feed some of you are using - http://feeds.gilbane.com/xmlblog - is being phased out.