Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 32 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Google Executive to Provide Opening Keynote Address on Search Quality at Upcoming Gilbane San Francisco Conference

The Gilbane Group and Lighthouse Seminars announced that Udi Manber, a Google Vice President of Engineering, will kick-off the annual Gilbane San Francisco conference on June 18th at 8:30am with a discussion on Google’s search quality and continued innovation. Now in its fourth year, the conference has rapidly gained a reputation as a forum for bringing together vendor-neutral industry experts that share and debate the latest information technology experiences, research, trends and insights. The conference takes place June 18-20 at the Westin Market Hotel in San Francisco. Gilbane San Francisco helps attendees move beyond the mainstream content technologies they are familiar with, to enhanced “2.0” versions, which can open up new business opportunities, keep customers engaged, and improve internal communication and collaboration. The 2008 event will have its usual collection of information and content technology experts, including practitioners, technologists, business strategists, consultants, and the leading analysts from a variety of market and technology research firms. Topics to be covered in-depth at Gilbane San Francisco include– Web Content Management (WCM); Enterprise Search, Text Analytics, Semantic Technologies; Collaboration, Enterprise Wikis & Blogs; “Enterprise 2.0” Technologies & Social Media; Content Globalization & Localization; XML Content Strategies; Enterprise Content Management (ECM); Enterprise Rights Management (ERM); and Publishing Technology & Best Practices. Details on the Google keynote session as well as other keynotes and conference breakout sessions can be found at http://gilbanesf.com/conference-grid.html

Only Humans can Ensure the Value of Search in Your Enterprise

While considering what is most important in selecting the search tools for any given enterprise application, I took a few minutes off to look at the New York Times. This article, He Wrote 200,000 Books (but Computers Did Some of the Work), by Noam Cohen, gave me an idea about how to compare Internet search with enterprise search.

A staple of librarians’ reference and research arsenal has been a category of reference material called “bibliographies of bibliographies.” These works, specific to a subject domain, are aimed at a usually scholarly audience to bring a vast amount of content into focus for the researcher. Judging from the article, that is what Mr. Parker’s artificial intelligence is doing for the average person who needs general information about a topic. According to at least one reader, the results are hardly scholarly.

This article points out several things about computerized searching:

  • It does a very good job of finding a lot of information easily.
  • Generalized Internet searching retrieves only publicly accessible, free-for-consumption, content.
  • Publicly available content is not universally vetted for accuracy, authoritativeness, trustworthiness, or comprehensiveness, even though it may be all of these things.
  • Vast amounts of accurate, authoritative, trustworthy and comprehensive content does exist in electronic formats that search algorithms used by Mr. Parker or the rest of us on the Internet will never see. That is because it is behind-the-firewall or accessible only through permission (e.g. subscription, need-to-know). None of his published books will serve up that content.

Another concept that librarians and scholars understand is that of primary source material. It is original content, developed (written, recorded) by human beings as a result of thought, new analysis of existing content, bench science, or engineering. It is often judged, vetted, approved or otherwise deemed worthy of the primary source label by peers in the workplace, professional societies or professional publishers of scholarly journals. It is often the substance of what get republished as secondary and tertiary sources (e.g. review articles, bibliographies, books).

We all need secondary and tertiary sources to do our work, learn new things, and understand our work and our world better. However, advances in technology, business operations, and innovation depend on sharing primary source material in thoughtfully constructed domains in our enterprises of business, healthcare, or non-profits. Patient’s laboratory or mechanical device test data that spark creation of primary source content need surrounding context to be properly understood and assessed for value and relevancy.

To be valuable enterprise search needs to deliver context, relevance, opportunities for analysis and evaluation, and retrieval modes that give the best results for any user seeking valid content. There is a lot that computerized enterprise search can do to facilitate this type of research but that is not the whole story. There must still be real people who select the most appropriate search product for that enterprise and that defined business case. They must also decide content to be indexed by the search engine based on its value, what can be secured with proper authentication, how it should be categorized appropriately, and so on. To throw a computer search application at any retrieval need without human oversight is a waste of capital. It will result in disappointment, cynicism and skepticism about the value of automating search because the resulting output will be no better than Mr. Parker’s books.

Semantic Technologies and our CTO Blog

We host a number of blogs, some more active than others. One of the least active (although it still gets a surprising amount of traffic) has been our CTO blog. However, I am happy to say that Colin Britton started blogging on semantic technologies yesterday. As a co-founder and CTO of Metatomix he led the development of a commercial product based on RDF – a not very well understood W3C semantic web standard. Colin’s first post on the CTO blog starts a series that will help shed a little more light on semantic technologies and their practical applications.

Some of you know that I remain skeptical of the new world “Semantic Web” vision, but I do think semantic technologies are important and have a lot to offer, and Colin will help you see why. Check out his first post and let him know what you think about semantic technologies and what you would like to know about.

Introduction to Semantic Technology

Ten years ago I had a belief that a meta-data approach to managing enterprise information was a valid way to go. The various structures, relationships and complexities of IT systems led to disjointed information. By relating the information elements to each other, rather than synchronizing the information together, we _might_ stand a chance.

At the same time a new set of standards was emerging, standards to describe, relate and query a new information model, based on meta-data, these became know as the Semantic Web, outlined in a Scientific American article (http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 ) in 2001.

Fast forward to 2008 – where are we with this vision. Some part of me is thrilled, another part disappointed. We have adoption of these standards and this approach at use in everyday information management situations. Major software companies and startup’s alike are implementing Semantic Technology in their offerings and products. However, I am disappointed that we still find it hard to communicate what this semantic technology means and how valuable it is. Most technologists I meet glaze over at the mention of the Semantic Web or any of it’s standards, yet when asked if they think RSS is significant, praise it’s contributions.

Over a series of posts to this blog, I would like to try and explain, share and show some of the value of Semantic Technology and why one should be looking at it.

Let’s start with what is Semantic Technology and what are the standards that define it’s openness. To quote Wikipedia “In software, semantic technology encodes meanings separately from data and content files, and separately from application code.” This abstraction is a core tenant and value provided by a Semantic approach to information management. The idea that our database or programming patterns do no restrict the form or boundaries of our information is a large shift from traditional IT solutions. The idea that our business logic should not be tied to the code that implements it, nor the information that it operates on is all provided through this semantic representation. So firstly ABSTRACTION is a key definition.

The benefit of this is that systems, machines, solutions, whatever term you wish to use can interact with each other – share, understand and reason, without having been explicitly programmed to understand each other.

With this you get to better manage CHANGE. Your content and systems can evole or change with the changes managed through the Semantic Technology layer.

So what makes up Semantic Technology, one sees the word in a number of soltuions or technologies, are they all created equal.

In my view, Semantic Technology can only truly claim to be so, if it is based on and implements the standards laid out through the (W3C) World Wide Web Consortium standards process. http://www.w3.org/2001/sw/

The vision of the Semantic Web and the standards required to support it continue to expand, but the anchor standards have been laid out for a while.

RDF – The model and syntax for describing information. It is important to understand that with the RDF standards there are multiple things defined to create this standard – the model (or data model) , the syntax (how it is written/serialized) and the formal semantics (or logic described by the use of rdf). In 2004, the original RDF specification was revised and published as 6 separate documents, each covering an important area of the standard.

RDF-S – Provides a typing system for RDF and the basic constructs for expressing Ontologies and relationships within the meta data structure.

OWL – To quote the W3C paper, this facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF-S by providing additional vocabulary along with a formal semantics.

SPARQL – While anyone with a Semantic Technology solution invented there own query language (why was this never there one in the first place!), SPARQL, pronounced “sparkle” is the w3c standardization of one. It is HUGE for Semantic Technology and makes all the effort with the other three standards worthwhile.

These standards are quite a pile to sift through, understanding the capabilities embodied in them takes significant effort, but it is the role of technologists in this arena to remove that need for you to understand them. It is our job to provide tools, solutions and capabilities that leverage the these standards bringing semantic technology to life and deliver the power defined within them.

But that is the subject of another post. So what does this all mean in real life? In my next post I will layout a concrete example using product information as an example.

 

Parsing the Enterprise Search Landscape

Steve Arnold’s Beyond Search report is finally launched and ready for purchase. Reviewing it gave me a different perspective on how to look at the array of 83 search companies I am juggling in my upcoming report: Enterprise Search Markets and Applications. For example, technological differentiators can channel your decisions about must haves/have nots in your system selection. Steve codifies considerations and details 15 technology tips that will help you frame those considerations.

We are getting ready for the third Gilbane Conference in which “search” has been a significant part of the presentation landscape in San Francisco, June 17 – 20th.Six sessions will be filled with case studies and enlightening “how-to-do-it-better” guidance from search experts with significant “hands-on” experience in the field. I will be conducting a workshop, immediately after the conference, How to Successfully Adopt and Deploy Search. Presentations by speakers and the workshop will focus on users’ experiences and guidance for evaluating, buying and implementing search. Viewing search from a usage perspective begs a different set of classification criteria for divvying up the products.

In February, Business Trends published an interview I gave them in December, Revving up Search Engines in the Enterprise. There probably isn’t much new in it for those who routinely follow this topic but if you are trying to find ways to explain what it is, why and how to get started, you might find some ideas for opening the discussion with others in your business setting. The intended audience is those who don’t normally wallow in search jargon. This interview pretty much covers the what, why, who, and when to jump into procuring search tools for the enterprise.

For my report, I have been very pleased with discussions I’ve had with a couple dozen people immersed in evaluating and implementing search for their organizations. Hearing them describe their experiences guides other ways to organize a potpourri of search products and how buyers should approach their selection. With over eighty products we have a challenge in how to parse the domain. I am segmenting the market space into multiple dimensions from the content type being targeted by “search” to the packaging models the vendors offer. When laying out a simple “ontology” of concepts surrounding the search product domain, I hope to clarify why there are so many ways of grouping the tools and products being offered. If vendors read the report to decide which buckets they belong in for marketing and buyers are able to sort out the type of product they need, the report will have achieved one positive outcome. In the meantime, read Frank Gilbane’s take on the whole topic of enterprise tacked onto any group of products.

As serendipity would have it, a colleague from Boston KM Forum, Marc Solomon, just wrote a blog on a new way of thinking of the business of classifying anything, “Word Algebra.” And guess who gave him the inspiration, Mr. Search himself, Steve Arnold. As a former indexer and taxonomist I appreciate this positioning of applied classification. Thinking about why we search gives us a good idea for how to parse content for consumption. Our parameters for search selection must be driven by that WHY?

IBM Labs Announces ProAct, Text Analytics for Call Centers

Researchers at IBM’s India Research Laboratory have developed software technology that uses sophisticated math algorithms to extract and deliver business insights hidden within the information gathered by companies during customer service calls and other interactions. The new business intelligence technology, called ProAct, is a text analytics tool, which automates previously manual analysis and evaluation of customer service calls and provides insight to help companies assess and improve their performance. ProAct provides an integrated analysis of structured information such as agent and product databases and unstructured data such as email, call logs, call transcription to identify reason for dissatisfaction, agent performance issues and typical product problems. Based on the Unstructured Information Management Analysis (UIMA) framework that IBM contributed to the open source Apache Software Foundation in 2006, the ProAct technology was initially developed as a service engagement. Now the new algorithms are being packaged in software and deployed in many IBM call center customers around the world. UIMA is an open source software framework that helps organizations build new analysis technologies that help organizations gain more insight from their unstructured information by discovering relationships, identifying patterns, and predicting outcomes. IBM uses UIMA to enable text analysis, extraction and concept search capabilities in other parts of its portfolio of enterprise search software products, including OmniFind Enterprise Edition, OmniFind Analytics Edition, and OmniFind Yahoo! Edition. http://www.research.ibm.com/irl/

Enterprise Whatever

As many of you know, we will be publishing a new report by Stephen Arnold in the next few weeks. The title, Beyond Search: What to do When Your Enterprise Search System Doesn’t Work, begs the question of whether there is such a thing as “enterprise search”. The title of Lynda’s consulting practice blog “Enterprise Search Practice Blog”, begs the same question. In the case of content management, a similar question is begged by AIIM – “The Enterprise Content Management Association” (ECM) and the recent AIIM conference.

The debate about whether “enterprise fill-in-your-favorite-software-application” makes any sense at all is not new. The terms “Enterprise Document Management” (EDM) and “Enterprise Resource Planning” (ERP) were first used in the 80s, and, at least in the case of EDM, were just as controversial. We have Documentum to thank for both EDM and ECM. Documentum’s original mission was to be the Oracle of documents, so EDM probably seemed like an appropriate term to use. Quickly however, the term was appropriated by marketing pros from many vendors, as well as analysts looking for a new category of reports and research to sell, and conference organizers keeping current with the latest buzzwords (I don’t exclude us from this kind of activity!). It was also naively misused by many enterprise IT (as opposed to “personal IT” I suppose) professionals, and business managers who were excited by such a possibility.

ECM evolved when the competition between the established EDM vendors and the fast growing web content management vendors reached a point where both saw they couldn’t avoid each other (for market cap as well as user requirement reasons). Soon, any vendor with a product to manage any kind of information that existed outside of (or even sometimes even in) a relational database, was an “ECM” vendor. This was what led AIIM to adopt and try to define and lay claim to the term – it would cover all of the records management and scanner vendors who were their existing constituents, and allow them to appeal to the newer web content management vendors and practitioners as well.

We used to cover the question “Is there any such thing as ECM?” in our analyst panels at our conferences, and usually there would be some disagreement among the analysts participating, but our mainly enterprise IT audience largely became savvy enough to realize it was a non-issue.

Why is it a non-issue?

Mainly because the term has almost no useful meaning. Nobody puts all their enterprise content in a single ECM repository. It doesn’t even make sense to use the same vendors’ products across all departments even in small organizations. – that is why there is such a large variety of vendors with wildly different functionality at ECM events such as AIIM. The most that you can assume when you hear “ECM vendor” is that they probably support more than one type of content management application, and that they might scale to some degree.

There are many who think it not unreasonable to have a single “enterprise search” application for all enterprise content. If you are new to search technology this is understandable, since you may think simple word or phrase search should be able to work across repositories. But, of course, it is not at all that simple, and if you want to know why see Stephen’s blog or Lynda’s blog, among others. Both Steve and Lynda are uncomfortable with “enterprise search”. Steve prefers the term “behind the firewall search”. Lynda sticks with the term but with a slightly different definition, although I don’t think they disagree at all on how the term is misused and misinterpreted.

Why use “Enterprise … Whatever” terms at all?

There is only one reason, and that is that buyers and users of technology use these terms as a shortcut, sometimes naively, but also sometimes with full understanding. There is just no getting around the barrier of actual language use. Clearly, using the shortcut is only the first step in communicating – more dialog is required for meaningful understanding.

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑