In the past few months, it is rare that I am briefed on an enterprise search product without a claim to provide “federated search.” Having worked with the original, and on one of the many review committees for it back in the early 1990s, it is a topic that always catches my attention.
Some of the history of search federation is described in this rather sketchy article at Wikipedia. However, I want clarify the original call for such a standard. It comes from the days when public access to search technologies was available primarily through library on-line catalogs in pubic and academic institutional libraries. A demand for the ability to search not only one’s local library system and network (e.g. a university often standardized on one library system to include all the holdings of a number of its own libraries), but also the holdings of other universities or major public libraries. The problem was that the data structures and protocols from one library system product to the next varied in way that made it difficult for the search engine of the first system to penetrate the database of records in another system. Records might have been meta-tagged similarly, but the way the metadata were indexed and accessible to retrieval algorithms was not possible with a translating layer between systems. Thus, the Z39.50 standard was established, originally to let one library system‘s user search from that library system into the contents of other libraries with different systems.
Ideally, results were presented to the searcher in a uniform citation format, organized to help the user easily recognize duplicated records, each marked with location and availability. Usually there was a very esoteric results presentation that could only be readily interpreted by librarians and research scholars.
Now we live in a digitized content environment in which the dissimilarities across content management systems, content repositories, publishers’ databases, and library catalogs have increased a hundred fold. The need for federating or translation layers to bring order to this metadata or metadata-less chaos has only become stronger. The ANSI standard is largely ignored by content platform vendors, thus leaving the federating solution to non-embedded search products. A buyer of search must do deep testing to determine if the enterprise search engine you have acquired actually stands up well under a load of retrieving across numerous disparate repositories. And you need a very astute and experienced searcher with expert familiarity of content in all the repositories to make an evaluation as to suitability for the circumstance in which the engine will be used.
So, let’s just recap what you need to know before you select and license a product claiming to support what you expect from search federation:
- Federated search is a process for retrieving content either serially or concurrently from multiple targeted sources that are indexed separately, and then presenting results in a unified display. You can imagine that there will be a huge variation in how well those claims might be satisfied.
- Federation is an expansion of the concept of content aggregation. It has play in a multi-domain environment of only internal sites OR a mix of internal and external sites that might include the deep (hidden) web. Across multiple domains complete federation supports at least four distinct functions:
- Integration of the results from a number of targeted searchable domains, each with its own search engine
- Disambiguation of content results when similar but non-identical pieces of content might be included
- Normalization of search results so that content from different domains is presented similarly
- Consolidation of the search operation (standardizing a query to each of the target search engines) and standardizing the results so they appear to be coming from a single search operation
In order to do this effectively and cleanly, the federating layer of software, which probably comes from a third-party like MuseGlobal, must have “connectors” that recognize the structures of all the repositories that will be targeted from the “home” search engine.
Why is this relevant? In short, because it is expected by users that when they search, all the results they are looking at represent all the content from all the repositories they believed they were searching in a format that makes sense to them. It is a very tall order for any search system to do this but when enterprise information managers are trying to meet a business manager’s or executive’s lofty expectations, anything less is viewed as the failure of enterprise search. Or else, they better set expectations lower.