Last week I began this entry, re-considered how to make the point and tucked it away. Today I unearthed an article I had not gotten around to putting into my database of interesting and useful citations. Lisa Nadile in, in CIO Magazine, hits the nail on the head with this statement, “Each search engine has its own top-secret algorithm to analyze this data…” This is tongue in cheek so you need to read the whole article to get the humor. Ms. Nadile’s article is geared to Internet marketing but the comments about search engines are just a relevant for enterprise search.
I may be an enterprise search analyst but there are a lot of things I don’t know about the guts of current commercial search tools. Some things I could know if I am willing to spend months studying patents and expensive reports, while other things are protected as trade secrets. I will never know what is under the hood of most products. Thirty years ago I knew a lot about relatively simple concepts like b-tree indexes and hierarchical, relational, networked and associative data structures for products I used and developed.
My focus has shifted to results and usability. My client has to be able to find all the content in their content repository or crawled site. If not, it had better be easy to discover why, and simple to take corrective actions with the search engine’s administration tools, if that is where the problem lies. If the scope of the corpus of content to be searched is likely to grow to hundreds of thousands of documents, I also care about hardware resource requirements and performance (speed) and scalability. And, if you have read previous entries, you already know that I care a lot about service and business relationships with the vendor because that is crucial to long term success. No amount of “whiz bang” technology will overcome a lousy client/vendor relationship.
Finding out what is going on under the hood with some imponderable algorithms isn’t really going to do me or my client any good when evaluating search products. Either the search tool finds stuff the way my client wants to find it, or it doesn’t. “Black art,” trade secret or “patent protected” few of us would really understand the secret sauce anyway.