Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 32 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Only Humans can Ensure the Value of Search in Your Enterprise

While considering what is most important in selecting the search tools for any given enterprise application, I took a few minutes off to look at the New York Times. This article, He Wrote 200,000 Books (but Computers Did Some of the Work), by Noam Cohen, gave me an idea about how to compare Internet search with enterprise search.

A staple of librarians’ reference and research arsenal has been a category of reference material called “bibliographies of bibliographies.” These works, specific to a subject domain, are aimed at a usually scholarly audience to bring a vast amount of content into focus for the researcher. Judging from the article, that is what Mr. Parker’s artificial intelligence is doing for the average person who needs general information about a topic. According to at least one reader, the results are hardly scholarly.

This article points out several things about computerized searching:

  • It does a very good job of finding a lot of information easily.
  • Generalized Internet searching retrieves only publicly accessible, free-for-consumption, content.
  • Publicly available content is not universally vetted for accuracy, authoritativeness, trustworthiness, or comprehensiveness, even though it may be all of these things.
  • Vast amounts of accurate, authoritative, trustworthy and comprehensive content does exist in electronic formats that search algorithms used by Mr. Parker or the rest of us on the Internet will never see. That is because it is behind-the-firewall or accessible only through permission (e.g. subscription, need-to-know). None of his published books will serve up that content.

Another concept that librarians and scholars understand is that of primary source material. It is original content, developed (written, recorded) by human beings as a result of thought, new analysis of existing content, bench science, or engineering. It is often judged, vetted, approved or otherwise deemed worthy of the primary source label by peers in the workplace, professional societies or professional publishers of scholarly journals. It is often the substance of what get republished as secondary and tertiary sources (e.g. review articles, bibliographies, books).

We all need secondary and tertiary sources to do our work, learn new things, and understand our work and our world better. However, advances in technology, business operations, and innovation depend on sharing primary source material in thoughtfully constructed domains in our enterprises of business, healthcare, or non-profits. Patient’s laboratory or mechanical device test data that spark creation of primary source content need surrounding context to be properly understood and assessed for value and relevancy.

To be valuable enterprise search needs to deliver context, relevance, opportunities for analysis and evaluation, and retrieval modes that give the best results for any user seeking valid content. There is a lot that computerized enterprise search can do to facilitate this type of research but that is not the whole story. There must still be real people who select the most appropriate search product for that enterprise and that defined business case. They must also decide content to be indexed by the search engine based on its value, what can be secured with proper authentication, how it should be categorized appropriately, and so on. To throw a computer search application at any retrieval need without human oversight is a waste of capital. It will result in disappointment, cynicism and skepticism about the value of automating search because the resulting output will be no better than Mr. Parker’s books.

Semantic Technologies and our CTO Blog

We host a number of blogs, some more active than others. One of the least active (although it still gets a surprising amount of traffic) has been our CTO blog. However, I am happy to say that Colin Britton started blogging on semantic technologies yesterday. As a co-founder and CTO of Metatomix he led the development of a commercial product based on RDF – a not very well understood W3C semantic web standard. Colin’s first post on the CTO blog starts a series that will help shed a little more light on semantic technologies and their practical applications.

Some of you know that I remain skeptical of the new world “Semantic Web” vision, but I do think semantic technologies are important and have a lot to offer, and Colin will help you see why. Check out his first post and let him know what you think about semantic technologies and what you would like to know about.

Introduction to Semantic Technology

Ten years ago I had a belief that a meta-data approach to managing enterprise information was a valid way to go. The various structures, relationships and complexities of IT systems led to disjointed information. By relating the information elements to each other, rather than synchronizing the information together, we _might_ stand a chance.

At the same time a new set of standards was emerging, standards to describe, relate and query a new information model, based on meta-data, these became know as the Semantic Web, outlined in a Scientific American article (http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 ) in 2001.

Fast forward to 2008 – where are we with this vision. Some part of me is thrilled, another part disappointed. We have adoption of these standards and this approach at use in everyday information management situations. Major software companies and startup’s alike are implementing Semantic Technology in their offerings and products. However, I am disappointed that we still find it hard to communicate what this semantic technology means and how valuable it is. Most technologists I meet glaze over at the mention of the Semantic Web or any of it’s standards, yet when asked if they think RSS is significant, praise it’s contributions.

Over a series of posts to this blog, I would like to try and explain, share and show some of the value of Semantic Technology and why one should be looking at it.

Let’s start with what is Semantic Technology and what are the standards that define it’s openness. To quote Wikipedia “In software, semantic technology encodes meanings separately from data and content files, and separately from application code.” This abstraction is a core tenant and value provided by a Semantic approach to information management. The idea that our database or programming patterns do no restrict the form or boundaries of our information is a large shift from traditional IT solutions. The idea that our business logic should not be tied to the code that implements it, nor the information that it operates on is all provided through this semantic representation. So firstly ABSTRACTION is a key definition.

The benefit of this is that systems, machines, solutions, whatever term you wish to use can interact with each other – share, understand and reason, without having been explicitly programmed to understand each other.

With this you get to better manage CHANGE. Your content and systems can evole or change with the changes managed through the Semantic Technology layer.

So what makes up Semantic Technology, one sees the word in a number of soltuions or technologies, are they all created equal.

In my view, Semantic Technology can only truly claim to be so, if it is based on and implements the standards laid out through the (W3C) World Wide Web Consortium standards process. http://www.w3.org/2001/sw/

The vision of the Semantic Web and the standards required to support it continue to expand, but the anchor standards have been laid out for a while.

RDF – The model and syntax for describing information. It is important to understand that with the RDF standards there are multiple things defined to create this standard – the model (or data model) , the syntax (how it is written/serialized) and the formal semantics (or logic described by the use of rdf). In 2004, the original RDF specification was revised and published as 6 separate documents, each covering an important area of the standard.

RDF-S – Provides a typing system for RDF and the basic constructs for expressing Ontologies and relationships within the meta data structure.

OWL – To quote the W3C paper, this facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF-S by providing additional vocabulary along with a formal semantics.

SPARQL – While anyone with a Semantic Technology solution invented there own query language (why was this never there one in the first place!), SPARQL, pronounced “sparkle” is the w3c standardization of one. It is HUGE for Semantic Technology and makes all the effort with the other three standards worthwhile.

These standards are quite a pile to sift through, understanding the capabilities embodied in them takes significant effort, but it is the role of technologists in this arena to remove that need for you to understand them. It is our job to provide tools, solutions and capabilities that leverage the these standards bringing semantic technology to life and deliver the power defined within them.

But that is the subject of another post. So what does this all mean in real life? In my next post I will layout a concrete example using product information as an example.

 

Parsing the Enterprise Search Landscape

Steve Arnold’s Beyond Search report is finally launched and ready for purchase. Reviewing it gave me a different perspective on how to look at the array of 83 search companies I am juggling in my upcoming report: Enterprise Search Markets and Applications. For example, technological differentiators can channel your decisions about must haves/have nots in your system selection. Steve codifies considerations and details 15 technology tips that will help you frame those considerations.

We are getting ready for the third Gilbane Conference in which “search” has been a significant part of the presentation landscape in San Francisco, June 17 – 20th.Six sessions will be filled with case studies and enlightening “how-to-do-it-better” guidance from search experts with significant “hands-on” experience in the field. I will be conducting a workshop, immediately after the conference, How to Successfully Adopt and Deploy Search. Presentations by speakers and the workshop will focus on users’ experiences and guidance for evaluating, buying and implementing search. Viewing search from a usage perspective begs a different set of classification criteria for divvying up the products.

In February, Business Trends published an interview I gave them in December, Revving up Search Engines in the Enterprise. There probably isn’t much new in it for those who routinely follow this topic but if you are trying to find ways to explain what it is, why and how to get started, you might find some ideas for opening the discussion with others in your business setting. The intended audience is those who don’t normally wallow in search jargon. This interview pretty much covers the what, why, who, and when to jump into procuring search tools for the enterprise.

For my report, I have been very pleased with discussions I’ve had with a couple dozen people immersed in evaluating and implementing search for their organizations. Hearing them describe their experiences guides other ways to organize a potpourri of search products and how buyers should approach their selection. With over eighty products we have a challenge in how to parse the domain. I am segmenting the market space into multiple dimensions from the content type being targeted by “search” to the packaging models the vendors offer. When laying out a simple “ontology” of concepts surrounding the search product domain, I hope to clarify why there are so many ways of grouping the tools and products being offered. If vendors read the report to decide which buckets they belong in for marketing and buyers are able to sort out the type of product they need, the report will have achieved one positive outcome. In the meantime, read Frank Gilbane’s take on the whole topic of enterprise tacked onto any group of products.

As serendipity would have it, a colleague from Boston KM Forum, Marc Solomon, just wrote a blog on a new way of thinking of the business of classifying anything, “Word Algebra.” And guess who gave him the inspiration, Mr. Search himself, Steve Arnold. As a former indexer and taxonomist I appreciate this positioning of applied classification. Thinking about why we search gives us a good idea for how to parse content for consumption. Our parameters for search selection must be driven by that WHY?

IBM Labs Announces ProAct, Text Analytics for Call Centers

Researchers at IBM’s India Research Laboratory have developed software technology that uses sophisticated math algorithms to extract and deliver business insights hidden within the information gathered by companies during customer service calls and other interactions. The new business intelligence technology, called ProAct, is a text analytics tool, which automates previously manual analysis and evaluation of customer service calls and provides insight to help companies assess and improve their performance. ProAct provides an integrated analysis of structured information such as agent and product databases and unstructured data such as email, call logs, call transcription to identify reason for dissatisfaction, agent performance issues and typical product problems. Based on the Unstructured Information Management Analysis (UIMA) framework that IBM contributed to the open source Apache Software Foundation in 2006, the ProAct technology was initially developed as a service engagement. Now the new algorithms are being packaged in software and deployed in many IBM call center customers around the world. UIMA is an open source software framework that helps organizations build new analysis technologies that help organizations gain more insight from their unstructured information by discovering relationships, identifying patterns, and predicting outcomes. IBM uses UIMA to enable text analysis, extraction and concept search capabilities in other parts of its portfolio of enterprise search software products, including OmniFind Enterprise Edition, OmniFind Analytics Edition, and OmniFind Yahoo! Edition. http://www.research.ibm.com/irl/

Enterprise Whatever

As many of you know, we will be publishing a new report by Stephen Arnold in the next few weeks. The title, Beyond Search: What to do When Your Enterprise Search System Doesn’t Work, begs the question of whether there is such a thing as “enterprise search”. The title of Lynda’s consulting practice blog “Enterprise Search Practice Blog”, begs the same question. In the case of content management, a similar question is begged by AIIM – “The Enterprise Content Management Association” (ECM) and the recent AIIM conference.

The debate about whether “enterprise fill-in-your-favorite-software-application” makes any sense at all is not new. The terms “Enterprise Document Management” (EDM) and “Enterprise Resource Planning” (ERP) were first used in the 80s, and, at least in the case of EDM, were just as controversial. We have Documentum to thank for both EDM and ECM. Documentum’s original mission was to be the Oracle of documents, so EDM probably seemed like an appropriate term to use. Quickly however, the term was appropriated by marketing pros from many vendors, as well as analysts looking for a new category of reports and research to sell, and conference organizers keeping current with the latest buzzwords (I don’t exclude us from this kind of activity!). It was also naively misused by many enterprise IT (as opposed to “personal IT” I suppose) professionals, and business managers who were excited by such a possibility.

ECM evolved when the competition between the established EDM vendors and the fast growing web content management vendors reached a point where both saw they couldn’t avoid each other (for market cap as well as user requirement reasons). Soon, any vendor with a product to manage any kind of information that existed outside of (or even sometimes even in) a relational database, was an “ECM” vendor. This was what led AIIM to adopt and try to define and lay claim to the term – it would cover all of the records management and scanner vendors who were their existing constituents, and allow them to appeal to the newer web content management vendors and practitioners as well.

We used to cover the question “Is there any such thing as ECM?” in our analyst panels at our conferences, and usually there would be some disagreement among the analysts participating, but our mainly enterprise IT audience largely became savvy enough to realize it was a non-issue.

Why is it a non-issue?

Mainly because the term has almost no useful meaning. Nobody puts all their enterprise content in a single ECM repository. It doesn’t even make sense to use the same vendors’ products across all departments even in small organizations. – that is why there is such a large variety of vendors with wildly different functionality at ECM events such as AIIM. The most that you can assume when you hear “ECM vendor” is that they probably support more than one type of content management application, and that they might scale to some degree.

There are many who think it not unreasonable to have a single “enterprise search” application for all enterprise content. If you are new to search technology this is understandable, since you may think simple word or phrase search should be able to work across repositories. But, of course, it is not at all that simple, and if you want to know why see Stephen’s blog or Lynda’s blog, among others. Both Steve and Lynda are uncomfortable with “enterprise search”. Steve prefers the term “behind the firewall search”. Lynda sticks with the term but with a slightly different definition, although I don’t think they disagree at all on how the term is misused and misinterpreted.

Why use “Enterprise … Whatever” terms at all?

There is only one reason, and that is that buyers and users of technology use these terms as a shortcut, sometimes naively, but also sometimes with full understanding. There is just no getting around the barrier of actual language use. Clearly, using the shortcut is only the first step in communicating – more dialog is required for meaningful understanding.

Enterprise Search Adopters

May-be it is this everlasting winter of weather events, but I’m ready for some big changes across the gray landscape. Experiencing endless winter has for me become a metaphor for what I observe within some enterprises as serial adoptions of search.

As I work on my forthcoming report, Enterprise Search Markets and Applications: Capitalizing on Emerging Demand, I am interviewing people who are deeply engaged in search technologies. They are presenting a view of search deployment and implementation that reinforces my own observations, complete with benefits and disappointments. However, search in enterprises is like recurring weather events, some big, some small but relentless in the repetitiveness of certain experiences. It seems that early adopters in the early stages of adoption often experience the euphoria of a fresh way to find stuff. Then inertia sets in as some large subset of adopters settles in to becoming routine but faithful users. The rest are like me with winter, looking for a really big change and more; the nitpicking begins as users cast their eyes to better options hyped by the media or by compatriots in other organizations with newer “bells and whistles.” Ah, what fickle beasts we are, as my husband will be very quick to remind me the first hot, humid day of summer when I complain in a desultory sulk.

So, I was delighted to read this article in the New York Times, Tech’s Late Adopters Prefer the Tried and True, by Miguel Helft, on March 12. I particularly loved this comment from the article: “Laggards have a bad rap, but they are crucial in pacing the nature of change, said Paul Saffo, a technology forecaster in Silicon Valley. Innovation requires the push of early adopters and the pull of laypeople asking whether something really works. If this was a world in which only early adopters got to choose, we’d all be using CB radios and quadraphonic stereo.” It helps to put one’s quest for the next big thing into perspective.

It included another quote from David Gans who, from the community of the Well in which people communicate using text-only systems, “Just because you have a nuclear-powered thing that can dry your clothes in five minutes doesn’t mean there isn’t value to hanging your clothes in the backyard and talking to your neighbors while doing it.” As one who has never owned a clothes drier, this validated one of my own conscious decisions.

Seriously though, given all the comments collected from my interviews and my own experiences, it is really time to remind adopters, early and late, to give thought to appropriateness, what benefits us or adversely distracts us in the technologies we implement in our working worlds. (I’ll leave your personal technology use for you to sort out.) Taking time to think about your intentions and “what comes next” after getting that “must have” new search system is something only you can control. Nobody on the selling side of a bakery will ever remind you that you don’t really neeeed another cookie.

And in one more point, if you are in the market for search+, Steve Arnold does a fine job of positioning the appropriateness of each of the 24 systems he reviews in Beyond Search. It might just help you resist the superfluous and take a look some other options instead.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑