Year: 2010 (Page 5 of 23)

What an Analyst Needs to Do What We Do

September 27, 2010 / Lynda Moulton / 0 Comments

Semantic Software Technologies: Landscape of High Value Applications for the Enterprise is now posted for you to download for free; please do so. The topic is one I’ve followed for many years and was convinced that the information about it needed to be captured in a single study as the number of players and technologies had expanded beyond my capacity for mental organization.

As a librarian, it was useful to employ a genre of publications known as “bibliography of bibliographies” on any given topic when starting a research project. As an analyst, gathering the baskets of emails, reports, and publications on the industry I follow, serves a similar purpose. Without a filtering and sifting of all this content, it had become overwhelming to understand and comment on the individual components in the semantic landscape.

Relating to the process of report development, it is important for readers to understand how analysts do research and review products and companies. Our first goal is to avoid bias toward one vendor or another. Finding users of products and understanding the basis for their use and experiences is paramount in the research and discovery process. With software as complex as semantic applications, we do not have the luxury of routine hands-on experience, testing real applications of dozens of products for comparison.

The most desirable contacts for learning about any product are customers with direct experience using the application. Sometimes we gain access to customers through vendor introductions but we also try very hard to get users to speak to us through surveys and interviews, often anonymously so that they do not jeopardize their relationship with a vendor. We want these discussions to be frank.

To get a complete picture of any product, I go through numerous iterations of looking at a company through its own printed and online information, published independent reviews and analysis, customer comments and direct interviews with employees, users, former users, etc. Finally, I like to share what I have learned with vendors themselves to validate conclusions and give them an opportunity to correct facts or clarify product usage and market positioning.

One of the most rewarding, interesting and productive aspects of research in a relatively young industry like semantic technologies is having direct access to innovators and seminal thinkers. Communicating with pioneers of new software who are seeking the best way to package, deploy and commercialize their offerings is exciting. There are many more potential products than those that actually find commercial success, but the process for getting from idea to buyer adoption is always a story worth hearing and from which to learn.

I receive direct and indirect comments from readers about this blog. What I don’t see enough of is posted commentary about the content. Perhaps you don’t want to share your thoughts publicly but any experiences or ideas that you want to share with me are welcomed. You’ll find my direct email contact information through Gilbane.com and you can reach me on Twitter at lwmtech. My research depends on getting input from all types of users and developers of content software applications, so, please raise your hand and comment or volunteer to talk.

Book Publishers: Stick to Your Knitting

September 27, 2010 / David Guenette / 0 Comments

A Blueprint for Book Publishing Transformation: Seven Essential Processes to Re-Invent Publishing, The Gilbane Group’s Publishing Practice latest study, is due out any day now. One thing about the study that sets it apart from other ebook-oriented efforts is that Blueprint describes technologies, processes, markets, and other strategic considerations from the book publisher’s perspective. From the Executive Summary of our upcoming study:

For publishers and their technology and service partners, the challenge of the next few years will be to invest wisely in technology and process improvement while simultaneously being aggressive about pursuing new business models.

The message here is that book publishers really need to “stick to their knitting,” or, as we put it in the study:

The book publisher should be what it has always best been about—discovering, improving, and making public good and even great books. But what has changed for book publishers is the radically different world in which they interact today, and that is the world of bits and bytes: digital content, digital communication, digital commerce.

If done right, today’s efforts toward digital publishing processes will “future proof” the publisher, because today’s efforts done right are aimed at adding value to the content in media neutral, forwardly compatible forms.

A central part of the “If done right” message is that book publishers still should focus on what publishers do with content, but that XML workflow has become essential to both print and digital publishing success. Here’s an interesting finding from Blueprint:

Nearly 48% of respondents say they use either an “XML-First” or “XML-Early” workflow. We define an XML-First workflow as one where XML is used from the start with manuscript through production, and we define an “XML-Early” workflow as one where a word processor is used by authors, and then manuscript is converted to XML.”

Tomorrow, Aptara and The Gilbane Group are presenting a webinar, eBooks, Apps and Print? How to Effectively Produce it All Together, with myself and Bret Freeman, Digital Publishing Strategist, Aptara. The webinar takes place on Tuesday, September 28, 2010, at 11 a.m., EST, and you can register here.

Sophia Launches Sophia Search for Intelligent Enterprise Search and Contextual Discovery

September 22, 2010 / NewsShark

Sophia, the provider of contextually aware enterprise search solutions, announced Sophia Search, a new search solution which uses a Semiotic-based linguistic model to identify intrinsic terms, phrases and relationships within unstructured content so that it can be recovered, consolidated and leveraged. Use of Sophia Search is designed to minimize compliance risk and reduce the cost of storing and managing enterprise information. Sophia Search is able to deliver a “three-dimensional” solution to discover, consolidate and optimize enterprise data, regardless of its data type or domain. Sophia Search helps organizations manage and analyze critical information by discovering the themes and intrinsic relationships behind their information, without taxonomies or ontologies, so that more relevant information may be discovered. By identifying both duplicates and near duplicates, Sophia Search allows organizations to effectively consolidate information and minimizing storage and management costs. Sophia Search features a patented Contextual Discovery Engine (CDE) which is based on the linguistic model of Semiotics, the science behind how humans understand the meaning of information in context. Sophia Search is available now to both customers and partners. Pricing starts at $30,000. http://www.sophiasearch.com/

Smart Content and the Pull of Search Engine Optimization

September 22, 2010 / Geoffrey Bock / 0 Comments

One of the conclusions of our report Smart Content in the Enterprise (forthcoming next week) is how a little bit of enrichment goes a long way. It’s important to build on your XML infrastructure, enrich your content a little bit (to the extent that your business environment is able to support), and expect to iterate over time.

Consider what happened at Citrix, reported in our case study Optimizing the Customer Experience at Citrix: Restructuring Documentation and Training for Web Delivery. The company had adopted DITA for structured publishing several years ago. Yet just repurposing the content in product manuals for print and electronic distribution, and publishing the same information as HTML and PDF documents, did not change the customer experience.

A few years ago, Citrix information specialists had a key insight: customers expected to find support information by googling the web. To be sure, there was a lot of content about various Citrix products out in cyberspace, but very little of it came directly from Citrix. Consequently the most popular solutions available via web-wide searching were not always reliable, and the detailed information from Citrix (buried in their own manuals) was rarely found.

What did Citrix do? Despite limited resources, the documentation group began to add search metadata to the product manuals. With DITA, there was already a predefined structure for topics, used to define sections, chapters, and manuals. Authors and editors could simply include additional tagged metadata that identified and classified the contents – and thus expose the information to Google and other web-wide search engines.

Nor was there a lot of time or many resources for up-front design and detailed analysis. To paraphrase a perceptive information architect we interviewed, “Getting started was a lot like throwing the stuff against a wall to see what sticks.” At first tags simply summarized existing chapter and section headings. Significantly, this was a good enough place to start.

Specifically, once Citrix was able to join the online conversation with its customers, it was also able to begin tracking popular search terms. Then over time and with successive product releases, the documentation group was able to add additional tagged metadata and provide ever more focused (and granular) content components.

What does this mean for developing smart content and leveraging the benefits of XML tagging? Certainly the more precise your content enrichment, the more findable your information is going to be. When considering the business benefits of search engine optimization, the quality of your tagging can always improve over time. But as a simple value proposition, getting started is the critical first step.

Revenge of the ECM nerds

September 16, 2010 / Scott Templeman

For those of you who aren’t familiar with who I am, I am the Marketing Specialist for Gilbane, more specifically the man behind the various social media curtains. One of my favorite parts of social media is memes, defined as, “a unit of cultural ideas, symbols or practices, which can be transmitted from one mind to another through writing, speech, gestures, rituals or other imitable phenomena.” The most famous example of a meme, almost synonymous with the internet now, is Lolcatz. One of the great pleasures I have managing the Gilbane accounts is the unique community. Defying stereotypes of computer geeks, the online CMS community has proven to be composed of a plethora of creative, witty, clever, and simply funny individuals spanning timezones, continents, and native languages. Earlier this year, we were treated with CMSHaikus, which I was happy to preserve in an ebook (the .pdf originally had Youtube videos embedded in it, but these have since been blocked due to a security patch). This time around, @Adriaanbloem took another meme and spun it with his own angle.

The tweets that followed were a mixture of angst, disappointment, frustration, front-line experience, but most importantly humor! The sarcasm runs rampant here, but the jabs are taken at brands, vendors, scripting languages, developers, each other, and consulting agencies (although the “Godfather” and the agency in his name still seems to command respect as of this writing ).

The engine seems to have plenty of meme steam left in it, but when it’s gone you can read the #CMSRetraction Archive, or better yet follow the participants and become part of the quirky CMS Twitterrati. If I missed you on the list, drop me a line (@gilbane or @tallbonez) and I will be sure to add you!

Repurposing Content vs. Creating Multipurpose Content

September 15, 2010 / dwaldt

In our recently completed research on Smart Content in the Enterprise we explored how organizations are taking advantage of benefits from XML throughout the enterprise and not just in the documentation department. Our findings include several key issues that leading edge XML implementers are addressing including new delivery requirements, new ways of creating and managing content, and the use of standards to create rich, interoperable content. In our case studies we examined how some are breaking out of the documentation department silo and enabling others inside or even outside the organization to contribute and collaborate on content. Some are even using crowd sourcing and social publishing to allow consumers of the information to annotate it and participate in its development. We found that expectations for content creation and management have changed significantly and we need to think about how we organize and manage our data to support these new requirements. One key finding of the research is that organizations are taking a different approach to repurposing their content, a more proactive approach that might better be called “multipurposing”.

In the XML world we have been talking about repurposing content for decades. Repurposing content usually means content that is created for one type of use is reorganized, converted, transformed, etc. for another use. Many organizations have successfully deployed XML systems that optimize delivery in multiple formats using what is often referred to as a Single Source Publishing (SSP) process where a single source of content is created and transformed into all desired deliverable formats (e.g., HTML, PDF, etc.).

Traditional delivery of content in the form of documents, whether in HTML or PDF, can be very limiting to users who want to search across multiple documents, reorganize document content into a form that is useful to the particular task at hand, or share portions with collaborators. As the functionality on Web sites and mobile devices becomes more sophisticated, new ways of delivering content are needed to take advantage of these capabilities. Dynamic assembly of content into custom views can be optimized with delivery of content components instead of whole documents. Powerful search features can be enhanced with metadata and other forms of content enrichment.

SSP and repurposing content traditionally focuses on the content creation, authoring, management and workflow steps up to delivery. In order for organizations to keep up with the potential of delivery systems and the emerging expectations of users, it behooves us to take a broader view of requirements for content systems and the underlying data model. Developers need to expand the scope of activities they evaluate and plan for when designing the system and the underlying data model. They should consider what metadata might improve faceted searching or dynamic assembly. In doing so they can identify the multiple purposes the content is destined for throughout the ecosystem in which it is created, managed and consumed.

Multipurpose content is designed with additional functionality in mind including faceted search, distributed collaboration and annotation, localization and translation, indexing, and even provisioning and other supply chain transactions. In short, multipurposing content focuses on the bigger picture to meet a broader set of business drivers throughout the enterprise, and even beyond to the needs of the information consumers.

It is easy to get carried away with data modeling and an overly complex data model usually requires more development, maintenance, and training than would otherwise be required to meet a set of business needs. You definitely want to avoid using specific processing terminology when naming elements (e.g., specific formatting, element names that describe processing actions instead of defining the role of the content). You can still create data models that address the broader range of activities without using specific commands or actions. Knowing a chunk of text is a “definition” instead of an “error message” is useful and far more easy to reinterpret for other uses than an “h2” element name or an attribute for display=’yes’. Breaking chapters into individual topics eases custom, dynamic assembly. Adding keywords and other enrichment can improve search results and the active management of the content. In short, multipurpose data models can and should be comprehensive and remain device agnostic to meet enterprise requirements for the content.

The difference between repurposing content and multipurpose content is a matter of degree and scope, and requires generic, agnostic components and element names. But most of all, multipurposing requires understanding the requirements of all processes in the desired enterprise environment up front when designing a system to make sure the model is sufficient to deliver designed outcomes and capabilities. Otherwise repurposing content will continue to be done as an afterthought process and possibly limit the usefulness of the content for some applications.

Early Access to Gilbane’s XML Report

September 15, 2010 / Mary Laplante / 0 Comments

If you’ve been reading our recent posts on Gilbane’s new research on XML adoption, you might be wondering how to get the report in advance of its availability from Gilbane later this month.

Smart Content in the Enterprise: How Next Generation XML Applications Deliver New Value to Multiple Stakeholders is currently offered by several of the study sponsors: IBM, JustSystems, MarkLogic, MindTouch, Ovitas, Quark, and SDL.

We’ll also be discussing our research in real time during a webinar hosted by SDL on November 4. Look for details within the next few weeks.

Leveraging Two Decades of Computational Linguistics for Semantic Search

September 14, 2010 / Lynda Moulton / 0 Comments

Over the past three months I have had the pleasure of speaking with Kathleen Dahlgren, founder of Cognition, several times. I first learned about Cognition at the Boston Infonortics Search Engines meeting in 2009. That introduction led me to a closer look several months later when researching auto-categorization software. I was impressed with the comprehensive English language semantic net they had doggedly built over a 20+ year period.

A semantic net is a map of language that explicitly defines the many relationships among words and phrases. It might be very simple to illustrate something as fundamental as a small geographical locale and all named entities within it, or as complex as the entire base language of English with every concept mapped to illustrate all the ways that any one term is related to other terms, as illustrated in this tiny subset. Dr. Dahlgren and her team are among the few companies that have created a comprehensive semantic net for English.

In 2003, Dr. Dahlgren established Cognition as a software company to commercialize its semantic net, designing software to apply it to semantic search applications. As the Gilbane Group launched its new research on Semantic Software Technologies, Cognition signed on as a study co-sponsor and we engaged in several discussions with them that rounded out their history in this new marketplace. It was illustrative of pioneering in any new software domain.

Early adopters are key contributors to any software development. It is notable that Cognition has attracted experts in fields as diverse as medical research, legal e-discovery and Web semantic search. This gives the company valuable feedback for their commercial development. In any highly technical discipline, it is challenging and exciting to finding subject experts knowledgeable enough to contribute to product evolution and Cognition is learning from client experts where the best opportunities for growth lie.

Recent interviews with Cognition executives, and those of other sponsors, gave me the opportunity to get their reactions to my conclusions about this industry. These were the more interesting thoughts that came from Cognition after they had reviewed the Gilbane report:

Feedback from current clients and attendees at 2010 conferences, where Dr. Dahlgren was a featured speaker, confirms escalating awareness of the field; she feels that “This is the year of Semantics.” It is catching the imagination of IT folks who understand the diverse and important business problems to which semantic technology can be applied.
In addition to a significant upswing in semantics applied in life sciences, publishing, law and energy, Cognition sees specific opportunities for growth in risk assessment and risk management. Using semantics to detect signals, content salience, and measures of relevance are critical where the quantity of data and textual content is too voluminous for human filtering. There is not much evidence that financial services, banking and insurance are embracing semantic technologies yet, but it could dramatically improve their business intelligence and Cognition is well positioned to give support to leverage their already tested tools.
Enterprise semantic search will begin to overcome the poor reputation that traditional “string search” has suffered. There is growing recognition among IT professionals that in the enterprise 80% of the queries are unique; these cannot be interpreted based on popularity or social commentary. Determining relevance or accuracy of retrieved results depends on the types of software algorithms that apply computational linguistics, not pattern matching or statistical models.

In Dr. Dahlgren’s view, there is no question that a team approach to deploying semantic enterprise search is required. This means that IT professionals will work side-by-side with subject matter experts, search experts and vocabulary specialists to gain the best advantage from semantic search engines.

The unique language aspects of an enterprise content domain are as important as the software a company employs. The Cognition baseline semantic net, out-of-the-box, will always give reliable and better results than traditional string search engines. However, it gives top performance when enhanced with enterprise language, embedding all the ways that subject experts talk about their topical domain, jargon, acronyms, code phrases, etc.

With elements of its software already embedded in some notable commercial applications like Bing, Cognition is positioned for delivering excellent semantic search for an enterprise. They are taking on opportunities in areas like risk management that have been slow to adopt semantic tools. They will deliver software to these customers together with services and expertise to coach their clients through the implementation, deployment and maintenance essential to successful use. The enthusiasm expressed to me by Kathleen Dahlgren about semantics confirms what I also heard from Cognition clients. They are confident that the technology coupled with thoughtful guidance from their support services will be the true value-added for any enterprise semantic search application using Cognition.

The free download of the Gilbane study and deep-dive on Cognition was announced on their Web site at this page.