Curated for content, computing, data, information, and digital experience professionals

Category: Semantic technologies (Page 38 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Mapping Search Requirements

Last week I commented on the richness of the search marketplace. However, diversity presents the enterprise buyer with pressure to be more focused on immediate and critical search needs.

The Enterprise Search Summit is being held in New York this week. Two years ago I found it a great place to see the companies offering search products, where I could easily see them all, and still attend every session in two days. This year, 2007, there were over 40 exhibitors, most offering solutions for highly differentiated enterprise search problems. Few of the offerings will serve the end-to-end needs of a large enterprise but many would be sufficient for medium to small organizations. The two major search engine categories used to be Web content keyword searching, and structured searching. Not only is my attention as an analyst being requested by major vendors offering solutions for different types of search but new products are being announced weekly. Newcomers include those describing their products as data mining engines, search and reporting “platforms,” BI intelligence engines, semantic and ontological search engines. This mix challenges me to determine if a product really solves a type of enterprise search problem before I pay attention.

You, on the other hand, need to do another type of analysis before considering specific options. Classifying search categories, taking a faceted approach will help you narrow down the field. Here is a checklist for categorizing what and how content needs to be found:

  • Content types (e.g. HTML pages, PDFs, images)
  • Content repositories (e.g. database applications, content management systems, collaboration applications, file locations)
  • Types of search interfaces and navigation (e.g. simple search box, metadata, taxonomy)
  • Types of search (e.g. keyword, phrase, date, topical navigation)
  • Types of results presentation (e.g. aggregated, federated, normalized, citation)
  • Platforms (e.g. hosted, intranet, desktop)
  • Type of vendor (e.g. search-only, single purpose application with embedded search, software as service – SaS )
  • Amount of content by type
  • Number and type of users by need (personas)

Then use any tools or resources at hand to harvest an understanding of the mapping results to learn who needs what type of content, in what format and its criticality to business requirements. Prioritizing the facets produces a multidimensional view of enterprise search requirements. This will go a long way to narrowing down the vendor list and gives you a tool to keep discussions focused.

There are terrific options in the marketplace and they will only become richer in features and complexity. Your job is to find the most appropriate solution for the business search problem you need to solve today, at a cost that matches your budget. You also want a product that can be implemented rapidly with immediate benefit linking to a real business proposition.

Will Search Technology Ever Become a Commodity?

This week, EMC announced a collaborative research network, with this headline: New EMC Innovation Network to Harness Worldwide Tech Resources, Accelerate Information Infrastructure Innovation. Among the areas that the research network will explore are Semantic Web, search, context, and ontological views.
There is a lot to feed on in this announcement but the most interesting aspect is the juxtaposition with other hardware giants’ forays into the world of document and content search software (e.g. IBM, CISCO), and recent efforts by software leaders Oracle and Microsoft to strengthen their offerings in the area of content and search.

One of the phrases in EMC’s announcement that struck me is the reference to “information infrastructure.” This phrase is used ubiquitously by IT folks to aggregate their hardware and network components with the assumption that because these systems store and transport data, they are information infrastructure. We need to recognize that there are two elements missing from this infrastructure view, skilled knowledge workers (e.g.content structure architects, taxonomists, specialist librarians) and software applications for content authoring, capture, organization, and retrieval. Judging from the language of EMC’s press release this might just be tacit recognition that hardware and networks do not make up an information infrastructure. But those of us in search and content management knew that all along; we don’t need a think tank to show us how the pieces fit together nor even how to innovate to create good infrastructure. Top notch professionals have been doing that for decades. Will this new network really reveal anything new?

EMC does not explicitly announce a plan to make search and information infrastructure product commodities but they do express the desire to build “commercial products” for this market. They have already acquired a few of the software components but have yet to demonstrate a tight integration with the rest of the company. Usually innovation comes from humble roots and grows organically through the sponsorship of a large organization, self-funding or other interested contributors. This effort to lead an innovation community to solutions for information infrastructure has the potential to spawn growth of truly innovative tools, methods and even standards for diverse needs and communities. Alternatively, it may simply be a push to bring a free-wheeling industry of multi-faceted components under central control with the result being tools and products that serve the lowest common denominator users.

From a search point of view, I for one am enjoying the richness of the marketplace and how varied the product offerings are for many specialized needs. For the time being, I remain skeptical that any hardware or software giant can sustain the richness of offerings that get to the heart of particular business search needs in a universal way. Commodity search solutions are a long way off for the community of organizations I encounter.

Siderean and Inxight Federal Systems Announce Partnership to Deliver Relational Navigation to Federal Government

Siderean Software announced that it has entered a reseller agreement with Inxight Federal Systems. Effective immediately, Siderean will be added to Inxight’s GSA-approved price list. Inxight’s software structures unstructured data by “reading” text and extracting important entities, such as people, places and organizations. It also extracts facts and events involving these entities, such as travel events, purchase events, and organizational relationships. Siderean’s Seamark Navigator then builds on this newly structured data, providing an relational navigational interface that allows users to put multi-source content in context to help improve discovery, access and participation across the information flow. Seamark Navigator uses the Resource Description Framework (RDF) and Web Ontology Language (OWL) standards developed by the World Wide Web Consortium (W3C). Siderean’s Seamark Navigator will provide an important add-on to Inxight’s metadata harvesting and extraction solutions. Inxight’s government customers will now be able to leverage Siderean’s relational navigation solutions to access more relevant and timely results derived from the full context and scope of information. As users refine their searches, Siderean dynamically displays additional navigation options and gives users summaries of those items that best match search criteria. Siderean also enables users to illuminate unseen relationships between sets of information and leverage human knowledge to explore information interactively. http://www.siderean.com, http://www.inxightfedsys.com

Turning Around a Bad Enterprise Search Experience

Many organizations have experimented with a number of search engines for their enterprise content. When the search engine is deployed within the bounds of a specific content domain (e.g. a QuickPlace site) the user can assume that the content being searched is within that site. However, an organization’s intranet portal with a free-standing search box comes with a different expectation. Most people assume that search will find content anywhere in the implied domain, and for most of us we believe that all content belonging to that domain (e. g. a company) is searchable.

I find it surprising how many public Web sites for media organizations (publishers) don’t appear to have their site search engines pointing to all the sub-sites indicated in site maps. I know from my experience at client sites that the same is often true for enterprise searching. The reasons are numerous and diverse, commentary for another entry. However, one simple notation under or beside the search box can clarify expectations. A simple link to a “list of searchable content” will underscore the caveat or at least tip the searcher that the content is bounded in some way.

When users in an organization come to expect that they will not find, through their intranet, what they are seeking but know to exist somewhere in the enterprise, they become cynical and distrustful. Having a successful intranet portal is all about building trust and confidence that the search tool really works or “does the job.” Once that trust is broken, new attempts to change the attitudes by deploying a new search engine, increasing the license to include more content, or doing better tuning to return more reliable results is not going to change minds without a lot of communication work to explain the change. I know that the average employee believes that all the content in the organization should be brought together in some form of federated search but now know it isn’t. The result is that they confine themselves to embedded search within specific applications and ignore any option to “search the entire intranet.”

It would be great to see comments from readers who have changed a Web site search experience from a bad scene to one with a positive traffic gain with better search results. Let us know how you did it so we can all learn.

The FAST acquisition of Convera

It has been a couple of weeks since the announcement that Fast Search & Transfer would acquire Convera’s RetrievalWare, a search technology built on the foundation of Excalibur and widely used in government enterprises.

At a recent Boston KM Forum meeting I asked Hadley Reynolds, VP & Director of the Center for Search Innovation at Fast, to comment on the acquisition. He indicated Fast’s interest in building up a stronger presence in the government sector, a difficulty for a Norwegian-based company. I remember Fast as a company launching in the U.S. with great fanfare in 2002 (http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=17223 ) to support FirstGov.gov, a portal to multi-agency content of the U.S. Government. That site has recently been re-launched as http://www.usa.gov/ using the Vivisimo search portal. There must be a story behind the story, as I hope to learn.

To add to the discussion, last week I moderated a session at the Gilbane San Francisco conference at which Helen Mitchell, Senior Search Strategist for Credo Systems and Workgroup Chairperson for the Convera User Group, spoke. I asked Helen before the program about her reaction to the recent announcement. She had already been in contact with Fast and received assurances that Convera Federal Users would be well supported by Fast and they want to actively participate in conversations with the group through on-line and in-person meetings. Helen was positive about the potential for RetrievalWare users gaining from the best of Fast technology while still being supported with the unique capabilities of Convera’s semantic, faceted search.

Erik Schwartz, Director of Product Management from Convera, was also present; I encouraged him and Helen to leverage the RetrievalWare user community to make sure Fast really understands the unique and diverse needs of search within the enterprise. We are all well aware that in the rush to build up large customer bases with a solid revenue stream of maintenance, vendors are likely to sacrifice unique technologies that are highly valued by customers. A bottom-line round of pragmatic cost cutting usually determines what R&D a vendor will fund, foregoing the long term good will that could accrue if they would belly-up to integrating these unique features into their own platform.

Time will tell how serious Fast is in giving its new base a truly valuable customer experience. I would also note that this acquisition has also been observed by a broader information management industry publication, Information Week. See David Gardner’s article at http://www.informationweek.com/news/showArticle.jhtml?articleID=198701793.

Search Help and Usability

Preparing for two upcoming meetings with search themes (Gilbane San Francisco and Boston KM Forum) has brought to mind many issues of search usability. At the core is the issue of search literacy. Offering some fundamental searching tips to non-professional searchers often results in a surprised reaction. (e.g. When told, if seeking information about a specific topic such as “industrial engineering,” enclose it in quotes to limit the search to that phrase. Without quotes, you will get all content with “industrial” and “engineering” anywhere in the content with no explicit relationship implied.)

If you are reading this you probably know that, but many do not. In order to learn what people search for on their company intranet and how they type their search requests, I spend time reading search log files. I do this for several reasons:

  • To learn terminology searchers are using to guide taxonomy building choices
  • To see the way searches are formulated, and followed up
  • To inform design decisions about how to make searching easier
  • To see what is searched but not found to inform future content inclusion
  • To view the searcher’s next step when the results are zero or huge

wo results remain consistent: less than 1% of the searchers place a phrase inside quotations, even when there are multiple words; word are often truncated but do not include a truncation symbol (usually an asterisk, “*”). Both reveal a probable lack of search conventions understanding, a search literacy problem. Here are a couple of possible solutions:

  • Put into place better help and training mechanisms to help the lost find their way,

OR

  • Remove the legacy practice of forcing command language type symbols on searchers for the most common search requests

Placing punctuation around a search string is a holdover from 30 years ago when searching was done using a command language. Since only a limited number of people ever knew this syntactical format, why does it persist as the default for a phrase search for Web-based search engines?

The solution of providing a better help page and getting people to actually use it is a harder proposition. This one from McGraw-Hill for BusinessWeek Online is pretty simple with just seven tips but who reads it? I expect very few, although it could dramatically improve their search results. http://search.businessweek.com/advanced.jsp.

If you are trying to improve the search experience for your intranet, there are two resources to consult for content usability on all fronts, not just search: useit.com, Jakob Nielsen’s Website and Jared Spool’s UIEtips, User Interface Engineering’s free email newsletter. In the meantime, think about whether you need to demand more core search usability or tunable default options from vendors, or whether better interface design could guide searchers to better results.

Fast to Acquire Convera’s RetrievalWare Business

Fast Search & Transfer announced its agreement to purchase selected assets of Convera Corporation. Under the terms of the signed agreement, FAST will acquire the assets of Convera’s RetrievalWare business which supports a wide range of mission-critical programs at government agencies and commercial enterprises. The acquisition, priced at $23 million, will help FAST expand its presence primarily in the government markets. Convera and FAST have also announced that Convera has licensed FAST Ad Momentum, a private-label contextual advertising and monetization platform developed with the support of online publishers. FAST Ad Momentum will be integrated with Convera’s hosted vertical search solution and its Publisher Control Panel. Expected to close in the second quarter, the acquisition is limited to Convera’s RetrievalWare business. Convera will continue to trade under the NASDAQ symbol CNVR. http://www.fastsearch.com, http://www.convera.com/

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑