Curated for content, computing, and digital experience professionals

Year: 2009 (Page 34 of 39)

Asbru Web Content Management v7.0 released

The Asbru Web Content Management system v7.0 for .NET, PHP and JSP/Java has been released. This version includes: functionality to check the server settings required/used by the web content management system. As default the initial server system check checks for required configuration files and folders and file create/write permissions. Additional and custom system check scripts can be added to check/report on any additional server settings that may be of interest to users in general and for specific local setups; Adds functionality to automatically analyse your existing website files (HTML files, Dreamweaver templates, images and other files) and import them into the web content management system for migration from an existing “static” HTML file-based to an Asbru Web Content Management system managed website. Dreamweaver templates are identified and converted to templates in the web content management system, and defined “editable regions” are identified and converted to content classes/elements in the web content management system; Adds functionality to define your own custom website settings and special codes to use these settings on your website pages and templates as well as in your website style sheets and scripts; Added “content format” functionality for simple text content items and for exact control over HTML code details for special requirements; Added support for multiple style sheets per page and template; and, added selective database backup and export of specific types of website data. http://asbrusoft.com/

Webinar Wednesday: 5 Predictions for Publishers in 2009

Please join me on a webinar sponsored by Mark Logic on Wednesday 2/18/09 at 2pm EST. I’ll be covering my five top predictions for 2009 (and beyond). The predictions come largely from a forthcoming research study "Digital Platforms and Technologies for Book Publishers: Implementations Beyond eBook," that Bill Trippe and I are writing. Here are the predictions:

  1. The Domain Strikes Back – Traditional publishers leverage their domain expertise to create premium, authoritative digital products that trump free and informed internet content.
  2. Discoverability Overcomes Paranoia – Publishers realize the value in being discovered online, as research shows that readers do buy whole books and subscriptions based on excerpts and previews.
  3. Custom, Custom, Custom – XML technology enables publishers to cost-effectively create custom products, a trend that has rapidly accelerated in the last six to nine months, especially in the educational textbook segment.
  4. Communities Count – and will exert greater influence on digital publishing strategies, as providers engage readers to help build not only their brands but also their products.
  5. Print on Demand – increases in production quality and cost-effectiveness, leading to larger runs, more short-run custom products and deeper backlists.

I look forward to your questions and comments! Register today at http://bit.ly/WApEW

Apples and Oranges: The SaaS Dialog

Most buyers of content technologies understand the key differences between acquiring a content solution as a service and licensing software for installation on servers behind their firewall. Less well understood, however, is the impact of those differences on the acquisition process. With SaaS, you’re not "buying" technology, as with licensed software; you’re entering a services agreement to access a business solution that includes software, applications, and infrastructure. The value proposition is very different, as is the basis for evaluating its fit with the organization’s needs.

The current worldwide economic situation is causing many organizations to take a serious look at SaaS offers as a strategy for continuing to move forward with critical business initiatives. Our latest Gilbane Beacon was developed to help companies evaluate SaaS fairly, with the goal of helping our readers find the best solution, regardless of technology delivery model. Communicating SaaS WCM Value: A Guide to Understanding the Business Case for Software-as-a-Service Solutions for Web Content Management explains why SaaS and licensed software are apples and oranges. The paper identifies the issues that matter–and those that don’t–when considering SaaS solutions for web content management. Available for download on our site.

The Marvels of Project Guttenberg

p146.jpg

If you don’t know Project Guttenberg–and you should–it’s well worth spending your time over there to familiarize yourself with its contents and the way it has gone about creating the collection.

I keep track of it through the RSS feed of recently added books, which is updated nightly. That’s where I find out about new books like, The Pecan and its Culture, published in 1906, which includes the photo shown at left.

On their own, the one image and the one title are perhaps not so interesting or so significant (though I for one love these little snapshots of Americana, especially such primary material). What is significant of course is the mass nature of the digitization, and the care in which it is undertaken.  I compare this care with the sometimes abysmal scanning work being done by Google (and with much more fanfare). The fruits of Project Guttenberg are much more openly available, much easier to access, and much easier to migrate to reading devices like the Kindle.

So as we look at all the eBook and digitization efforts underway today, let’s not forget Project Guttenberg.

Winds of Change at Tools of Change

O’Reilly’s Tools of Change conference in New York City this week was highly successful, both inside and outside the walls of the Marriott Marquis. The sessions were energetic, well-attended, and–on the whole–full of excellent insight and ideas about the digital trends taking a firm hold of nearly all sectors of the publishing business. Outside the walls, especially on Twitter, online communities were humming with news and commentary on the the conference. (You almost could have followed the entire conference just by following the #toc hash tag at Twitter and accessing the online copies of the presentations.)

But if you had done that, you would have missed the fun of being there. There were some superb keynotes and some excellent general sessions. Notable among the keynotes were Tim O’Reilly himself, Neelan Choksi from Lexcycle (Stanza), and Cory Doctorow. The general sessions  covered a fairly broad spectrum of topics but were heavy on eBooks and community. Because of my own and my clients’ interests, I spent most of my time in the eBook sessions. The session eBooks I: Business Models and Strategy was content-rich. To begin with, you heard straight from senior people at major publishers with significant eBook efforts (Kenneth Brooks from Cengage Learning, Leslie Hulse from Harper Collins Publishers, and Cynthia Cleto from Springer Science+Business Media). Along with their insight, the speakers–and moderator Michael Smith from IDPF–assembled an incredibly valuable wiki of eBook business and technical material to back up their talk. I also really enjoyed a talk from Gavin Bell of Nature, The Long Tail Needs Community, where he made a number of thoughtful points about how publishers need to think longer and harder about how reading engages and changes people and specifically how a publisher can build community around those changes and activities.

There were a few soft spotsin the schedule. Jeff Jarvis’ keynote, What Would Google do with Publishing?, was more about plumping his new book (What Would Google Do?) than anything else, but was also weirdly out of date, even though the book is hot off the presses, with 20th century points like “The link changes everything” and “If you’re not searchable, you won’t be found.” (Publishers are often, somewhat unfairly, accused of being Luddite, but they are not that Luddite.) There were also a couple of technical speakers who didn’t seem to make the necessary business connections to the technical points they were making, which would have been helpful to those members of the audience who were less technical and more publishing-product and -process oriented. But these small weaknesses were easily outshone by the many high points, the terrific overall energy, and the clear enthusiasm of the attendees.

One question I have for the O’Reilly folks is to ask how they will keep the energy going. They have a nascent Tools of Change community site. Perhaps they could enlist some paid community managers to seed and moderate conversations, and also tie community activities to other O’Reilly products such as the books and other live and online events.

O’Reilly has very quickly established a very strong conference and an equally strong brand around the conference. With the publishing industry so engulfed in digital change now, I have to think this kind of conference and community can only continue to grow.

On Stimulating Open Data Initiatives

Yesterday the big stimulus bill cleared the conference committee that resolves the Senate and House versions. If you remember your civics that means it will be likely to pass in the chambers and then be signed into law by the president.

Included in the bill are billions of dollars for digitizing important information such as medical records or government information. Wow! That is a lot of investment! The thinking is that inaccessible information locked in paper or proprietary formats cost us billions each year in productivity. Wow! That’s a lot of waste! Also, that access to the information could spawn a billions of dollars of new products and services, and therefore income and tax revenue. Wow! That’s a lot of growth!

Many agencies and offices have striven to expose useful official information and reports at the federal and state level. Even so, there is a lot of data still locked away, or incomplete or in difficult to use forms. A while ago a Senate official once told me that they do not maintain a single, complete, accurate, official copy of the US Statutes internally. Even if this is no longer true, the public often relies on the “trusted” versions that are available only through paid online services. Many other data types, like many medical records, only exist in paper.

There are a lot of challenges, such as security and privacy issues, even intellectual property rights issues. But there are a lot of opportunities too. There are thousands of data sources that could be tapped into that are currently locked in paper or proprietary formats.

I don’t think the benefits will come at the expense of commercial services already selling this publicly owned information as some may fear. These online sites provide a service, often emphasizing timeliness or value adds like integrating useful data from different sources, in exchange for their fees. I think a combination of free government open data resources and delivery tools, plus innovative commercial products will emerge. Maybe some easily obtained data may become commoditized, but new ways of accessing and integrating information will emerge. The big information services probably have more to fear from startups than from free government applications and data.

As it happens, I saw a demo yesterday of a tool that took all the activity of a state legislature and unified it under one portal. This allows people to track a bill and all related activity in a single place. For free! The bill working its way through both chambers is connected to related hearing agendas and minutes, which are connected to schedules, with status and other information captured in a concise dashboard-like screen format (there are other services you can pay for which fund the site). Each information component came from a different office and was originally in it’s own specialized format. What we were really looking at was a custom data integration application done with AJAX technology integrating heterogeneous data in a unified view. Very powerful, and yet scalable. The key to its success was strong integration of data, the connections that were used to tie the information together. The vendor collected and filtered the data, converted to a common format, added the linkage and relationship information to provide an integrated view into data. All source data is stored separately and maintained by different offices. Five years ago it would have been a lot more difficult to create the service. Technology has advanced, and the data are increasingly available in manageable forms.

The government produces a lot of information that affect us daily that we, as taxpayers and citizens, actually own, but have limited or no access to. These include statutes and regulations, court cases, census data, scientific data and research, agricultural reports, SEC filings, FDA drug information, taxpayer publications, forms, patent information, health guidelines, etc., etc., etc. The list is really long. I am not even scratching the surface! It also includes more interactive and real-time data, such as geological and water data, whether information, and the status of regulation and legislation changes (like reporting on the progress of the stimulus bill as it worked it way through both chambers). All of these can be made more current, expanded for more coverage, integrated with related materials, validated for accuracy. There are also new opportunities to open up the process of using forums and social media tools for collecting feedback from constituents and experts (like the demo mentioned above). Social media tools may both give people an avenue to express their ideas to their elected officials, as well as be a collection tool to gather raw data that can be analyzed for trends and statistics, which in turn becomes new government data that we can use.

IMHO, this investment in open government data is a powerful catalyst that could actually create or change many jobs or business models. If done well, it could provide significant positive returns, streamline government, open access to more information, and enable new and interesting products and applications. </>

DPCI Announces Partnership with Mark Logic to Deliver XML-Based Content Publishing Solutions

DPCI, a provider of integrated technology solutions for organizations that need to publish content to Web, print, and mobile channels, announced that it has partnered with Mark Logic Corporation to deliver XML-based content publishing solutions. The company’s product, MarkLogic Server, allows customers to store, manage, search, and dynamically deliver content. Addressing the growing need for XML-based content management systems, DPCI and Mark Logic have been collaborating on several projects including one that required integration with Amazon’s Kindle reading device. Built specifically for content, MarkLogic Server provides a single solution for search and content delivery that allows customers to build digital content products: rrom task-sensitive online content delivery applications that place content in users’ workflows to digital asset distribution systems that automate content delivery; from custom publishing applications that maximize content re-use and repurposing to content assembly solutions to integrate content. http://www.marklogic.com, http://www.databasepublish.com

Native Database Search vs. Commercial Search Engines

This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.

The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.

To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today’s content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records.

This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean “and” or “or” query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.

So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.

Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:

  • Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.
  • Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.
  • Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.
  • Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.
  • Social features that support interaction among users and personalization options must be built.
  • Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.

Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑