Native Database Search vs. Commercial Search Engines

This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.

The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.

To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today’s content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records.

This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean “and” or “or” query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.

So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.

Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:

Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.
Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.
Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.
Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.
Social features that support interaction among users and personalization options must be built.
Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.

Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.

Koen,
To answer, I need to go back to the way I define “enterprise search.” For my work I believe that all search tools or search processes that support searching activities behind the firewall qualify as enterprise search. So, it encompasses search tools that operate on small or select domains, such as within a “document management system” domain but also search tools and technologies that operate on multiple domains or repositories.
There are good and valid reasons to use a specific embedded search technology for application on a discrete set of content, especially if that domain has unique attributes and purposes that need to be leveraged by the searcher when they are operating within that application.
I believe that search embedded in a document management suite for use in the development of documents (manufacturing highly structured documents) could be very useful when I am a writer or editor in that environment because there are times that I would not want the distraction of content external to this domain coming into my search results. That does not mean that I would not also want enterprise search with a broader focus accessible to me as well.
There are times when strictly focused search with special functions for the application are useful and other times when broadly applied search makes more sense, all within the enterprise. It all depends on my intent, my focus, my feature and functional requirements at the moment. Enterprise Search broadly applied could be a distraction to document management or an improvement – it just depends on the situation.

2 Comments

Koen
February 9, 2009 at 5:56 am

Great answer on a few questions about Enterprise Search, but works Enterprise Search not better in combination with Document Management, or is Enterprise search just a improvement and helper to Document Management?
Lynda Moulton
February 9, 2009 at 8:02 pm

Koen,
To answer, I need to go back to the way I define “enterprise search.” For my work I believe that all search tools or search processes that support searching activities behind the firewall qualify as enterprise search. So, it encompasses search tools that operate on small or select domains, such as within a “document management system” domain but also search tools and technologies that operate on multiple domains or repositories.
There are good and valid reasons to use a specific embedded search technology for application on a discrete set of content, especially if that domain has unique attributes and purposes that need to be leveraged by the searcher when they are operating within that application.
I believe that search embedded in a document management suite for use in the development of documents (manufacturing highly structured documents) could be very useful when I am a writer or editor in that environment because there are times that I would not want the distraction of content external to this domain coming into my search results. That does not mean that I would not also want enterprise search with a broader focus accessible to me as well.
There are times when strictly focused search with special functions for the application are useful and other times when broadly applied search makes more sense, all within the enterprise. It all depends on my intent, my focus, my feature and functional requirements at the moment. Enterprise Search broadly applied could be a distraction to document management or an improvement – it just depends on the situation.

Native Database Search vs. Commercial Search Engines

2 Comments

Leave a Reply Cancel reply

Subscribe to the Gilbane Advisor

Choose Language

Topics we cover

Policies

Contact