Curated for content, computing, and digital experience professionals

Category: Enterprise search & search technology (Page 18 of 59)

Research, analysis, and news about enterprise search and search markets, technologies, practices, and strategies, such as semantic search, intranet collaboration and workplace, ecommerce and other applications.

Before we consolidated our blogs, industry veteran Lynda Moulton authored our popular enterprise search blog. This category includes all her posts and other enterprise search news and analysis. Lynda’s loyal readers can find all of Lynda’s posts collected here.

For older, long form reports, papers, and research on these topics see our Resources page.

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Enterprise Search is Never Magic

How is it that the blockbuster deals for acquiring software companies that rank highest in their markets spaces seem to end up smelling bad several months into the deals? The latest acquisition to take on taint was written about in the Wall Street Journal today, noting that HP Reports $8.8 Billion Charge on Accounting Misstatement at Autonomy. Not to dispute the fact that enterprise search megastars Fast (acquired by Microsoft) and Autonomy had some terrific search algorithms and huge presence in the enterprise market, there is a lot more to supporting search than the algorithms.

The fact that surrounding support services have always been essential requirements for making these two products successful in deployment has been well documented over the years. Hundreds of system integrators and partner companies to Microsoft and Autonomy do very well making these systems deliver the value that has never been attainable with just out-of-the-box installations. It takes a team of content, search and vocabulary management specialists to deliver excellent results. For any but the largest corporations, the costs and time to achieve full implementation have rarely been justifiable.

Many fine enterprise search products deliver high value at much more reasonable costs, and with much more efficient packaging, shorter deployment times and lower on-going overhead. Never to be ignored is that enterprise search must be accounted for as infrastructure. Without knowing where the accounting irregularities (also true with Fast) actually lay, I suspect that HP bought the brand and the prospective customer relationships only to discover that the real money was being made by partners and integrators, and the software itself was a loss leader. If Autonomy did not bring with it a solid service and integration operation with strong revenues and work in the pipeline, HP could not have gained what it bargained for in the purchase. I “know” nothing but these are my hunches.

Reflecting back on a couple of articles (If a Vendor Spends Enough… and Enterprise Search and Collaboration…) I wrote a couple of years ago, as Autonomy began hyping its enterprise search prowess in Information Week ads, it seems that marketing is all the magic it needed to reel in the biggest fish of all – a sale to HP.

Right Fitting Enterprise Search: Content Must Fit Like a Glove

This story brought me up short: Future of Data: Encoded in DNA by Robert Lee Hotz in the Wall Street Journal, Aug. 16, 2012. It describes how “…researchers encoded an entire book into the genetic molecules of DNA, the basic building block of life, and then accurately read back the text.” The article then went on to quote Harvard University’s project senior researcher, molecular geneticist, George Church as saying, “A device the size of your thumb could store as much information as the whole Internet. While this concept intrigues and excites me for the innovation and creative thinking, it stimulates another thought, as well. Stop the madness of content overload first – force it to be managed responsibly.

While I have been sidelined from blogging for a couple of months, industry pundits have been contributing their comments, reflections and guidance on three major topics. Big Data tops the list, with analytics a close second, rounded out by contextual relevance as an ever present content findability issue. In November at Gilbane Boston the program features a study conducted by Findwise, Enterprise Search and Findability Survey,2012, which you can now download. It underscores a disconnect between what enterprise searchers want and how search is implemented (or not), within their organizations. As I work to assemble content, remarks and readings for an upcoming graduate course on “Organizing and Accessing Information and Knowledge,” I keep reminding myself what knowledge managers need to know about content to make it accessible.

So, how would experts for our three dominant topics solve the problems illustrated in the Findwise survey report?

For starters, organizations must be more brutal with content housekeeping, or more specifically housecleaning. As we debate whether our country is as great at innovation as in generations past, consider big data as a big barrier. Human beings, even brilliant ones, can only cope with so much information in their waking working hours. I posit that we have lost the concept of primary source content, in other words content that is original, new or innovative. It is nearly impossible to hone in on information that has never been articulated in print or electronically disseminated before, excluding all the stuff we have seen, over and over again. Our concept of terrific search is to be able to traverse and aggregate everything “out there” with no regard for what is truly conceptually new. How much of that “big data” is really new and valuable? I am hoping that other speakers at Gilbane Boston 2012 can suggest methods for crunching through the “big” to focus search on the best, most relevant and singular primary source information.

Second, others have commented, and I second the idea, that analytic tools can contribute significantly to cleansing search domains of unwanted and unnecessary detritus. Search tools that auto-categorize and cross-categorize content, whether the domain is large or small should be employed during any launch of a new search engine to organize content for quick visual examination, showing you where metadata is wrong, mis-characterized, or poorly tagged. Think of a situation where templates are commonly used for enterprise reports and the name of the person who created the template becomes the “author” of every report. Spotting this type of problem and taking steps to remediate and cleanse metadata, before deploying the search system is a fundamental practice that will contribute to better search outcomes. With thoughtful management, this type of exercise will also lead to corrective actions on the content governance side by pointing to how metadata must be handled. Analytics functions that leverage search to support cleaning up data stores are among the most practical tools now packaged with newer search products.

Finally, is the issue of vocabulary management and assigning terminology that is both accurate and relevant for a specific community that needs to find content quickly and without multiple versions, or without content that is just a re-hash of earlier findings published by the originator. Original publication dates, source information and proper author attribution are key elements of metadata that must be in place for any content that is targeted for crawling and indexing. When metadata is complete and accurate, a searcher can expect the best and most relevant content to rise to the top of a results page.

I hope others in a position to do serious research (perhaps a PhD dissertation) will take up my challenge to codify how much of “big data” is really worthy of being found – again, again, and again. In the meantime, use the tools you have in the search and content management technologies to get brutal. Weed the unwanted and unnecessary content so that you can get down to the essence of what is primary, what is good, and what is needed.

Search Engines; They’ve Been Around Longer Than You Think

It dates me, as well as search technology, to acknowledge that an article in Information Week by Ken North containing Medlars and Twitter in the title would be meaningful. Discussing search requires context, especially when trying to convince IT folks that special expertise is required to do search really well in the enterprise, and it is not something acquired in computer science courses.

Evolution of search systems from the print indexes of the early 1900s such as Index Medicus (National Library of Medicine’s index to medical literature) and Chemical Abstracts to the advent of the online Medical Literature Analysis and Retrieval System (Medlars) in the 1960s was slow. However, the phases of search technology evolution since the launch of Medlars has hardly been warp speed. This article is highly recommended because it gives historical context to automated search while defining application and technology changes over the past 50 years. The comparison between Medlars and Twitter, as search platforms is fascinating, something that would never have occurred to me to explore.

A key point of the article is the difference between a system of search designed for archival content with deeply hierarchical categorization for a specialized corpus versus a system of highly transient, terse and topically generalized content. Last month I commented on the need to have search present in your normal work applications and this article underscores an enormous range of purpose for search. Information of a short temporal nature and scholarly research each have a place in the enterprise but it would be a stretch to think of searching for both types via a single search interface. Wanting to know what a colleague is observing or learning at a conference is very different than researching the effects of a uranium exposure on the human anatomy.

What have not changed much in the world of applied search technology are the reasons we need to find information and how it becomes accessible. The type of search done in Twitter or on LinkedIn today is for information that we used to pick up from a colleague (in person or on the phone) or in industry daily or weekly news publications. That’s how we found the name of an expert, learned the latest technologies being rolled out at a conference or got breaking news on a new space material being tested. What has changed is the method of retrieval but not by a lot, and the relative efficiency may not be that great. Today, we depend on a lot of pre-processing of information by our friends and professional colleagues to park information where we can pick it up on the spur of the moment – easy for us but someone still spends the time to put it out there where we can grab it.

On the other end of the spectrum is that rich research content that still needs to be codified and revealed to search engines with appropriate terminology so we can pursue in-depth searching to get precisely relevant and comprehensive results. Technology tools are much better at assisting us with content enhancement to get us the right and complete results, but humans still write the rules of indexing and curate the vocabularies needed for classification.

Fifty years is a long time and we are still trying to improve enterprise search. It only takes more human work to make it work better.

Embedded Search in the Enterprise

We need to make a distinction between “search in the enterprise” and “enterprise-wide search.” The former is any search that exists persistently in view as we go about our primary work activities. The latter commonly assumes aggregation of all enterprise content via a single platform OR enterprise content to which everyone in the organization will have access. So many attempts at enterprise-wide search are reported to be compromised or frustrated before achieving successful outcomes that it is time to pay attention to point-of-need solutions. This is search that will smoothly satisfy routine retrieval requirements as we work.

Most of us work in a small number of applications all day. A writer will be wedded to a content creation application plus research sources both on the web and internal to the enterprise in which writing is being done. Finding information to support writing whether it is a press release, marketing brochure or technical documentation to accompany a technical product requires access to appropriate content for the writer to deliver to an audience. The audience may be a business analyst, customer’s buyer or product user with advanced technical expertise. During any one work assignment, the writer will usually be focused on one audience and will only need a limited view of content specific to that task.

When a search takes us on a merry chase through multiple resource repositories or in a single repository with heaps of irrelevant content and no good results, we are being forced into a mental traffic nightmare, not of our own making. As this blog post by Tony Schwartz reminds us, we need time to focus and concentrate. It enables us to work smarter and more calmly; for employers seeking to support workers with the best tools, search that works well at the point of doing an assignment is the ultimate perk. I know how frantic and fractionated my mental state becomes as I follow one fruitless web of links after another that I believe will lead me to the piece of information I need. Truthfully, I often become so absorbed in the search and ancillary information I “discover” along the way that sight of the target becomes secondary.

New wisdom from a host of analysts and writers suggests that embedded search is more than a trend, as is search with a specific focus or purposeful business goal. The fact that FAST is now embedded with and for SharePoint and its use is growing principally in that arena illustrates the trend. But readers should also consider a large array of newer search solutions that are strong on semantic features, APIs, integration options, and connectors to a huge variety of content that exists in other application repositories. This article by James Martin in CIO, How to Evaluate Enterprise Search has helpful comments from Leslie Owens of Forrester Research and the rise of connectors is highlighted by Alan Pelz-Sharpe in this post.

Right now two rather new search engines are on my radar screen because of their timely entrance to the marketplace. One is Q-Sensei, which has just released their version 2.0. It is an ontology-based solution very much focused on efficiently processing big data, quick deployment, and integration with content applications. The second is Cambridge Semantics with its Anzo semantic solutions for analyzing and retrieving business data. Finally, I am very excited that ISYS was the object of an acquisition by Lexmark. It was an unexpected move but they deserved to be recognized for having solid connector/filter technology and a large, satisfied customer base. It will be interesting to see how a hardware vendor, noted for print technology, will integrate ISYS search software into its product offerings. Information retrieval belongs where work is being done.

These are just three vendors poised to change the expectations of searchers by fulfilling search needs, embedded or integrated efficiently in select business application areas. Martin White’s most recent enumeration of search vendors puts the list at about 70; they are primarily vendors with standalone search products, products that support standalone search or search engines that complement other content applications. You will see many viable options there that are unfamiliar but be sure to dig down to understand where each might fill a unique need in your enterprise.

When seeking solutions for search problems you need to really understand the purpose before seeking candidate vendors. Then focus on products that have the same clarity of applicability you want. They may be embedded with a product such as Lexmark’s, or a CAD system. The first step is to decide where and for whom you need search to be present.

Researching Enterprise Search System Integrators

When looking at job postings on the Enterprise Search Engine Professionals Group on LinkedIn shows positions calling for developers with specific programming skills or knowledge of specific products. It may be a faulty assumption, but it appears that enterprises on the path to a new or upgraded search application implementation are paying less attention to the other professional skills needed on a successful team.

Knowing how to implement, tune, administer and enhance search outcomes has more to do with understanding business needs and content management than writing code. You need the expertise of content management professionals who understand the importance of (and how to leverage) metadata. You definitely need people who know how to build and maintain the controlled vocabularies that make metadata valid and valuable within the context of your organization. These professionals are not traditionally found in IT groups; they are more likely to come from a business function, or information science background, preferably with a deep knowledge of the enterprise and how it works.

Integrating content management systems (CMS), digital asset management (DAM), taxonomy, thesaurus or ontology management with enterprise search applications means understanding much more than coding. However, having a tight relationship with IT is imperative for good integration of components. In small and medium organizations it is rare to find experts across all areas and that is where a new breed of system integrators are bringing the most value as noted in the post in December, 2011.

As promised, here are some tips for finding and qualifying the right integrator for your organization. The first step is to identify service providers to consider. Use three principle discovery techniques:

  • Simple searches for “system integration providers”, “search integration”, “software” or “software integration” are all explicit phrases to use in web search engines
  • Vendor listings and directories such as those published by Information Today, and AIIM or “buyers’ guides” associated with specific product groups.
  • Conference exhibitors and conference attendees (consultants and vendors) who may attend or present but not exhibit at conference where the focus is a content management topic.

Next, qualify those you have discovered:

  • Scour their web sites by digging into links to Case Studies, Customers, Partners, and Press Releases. Each of these may lead to information about who the vendor has done business with and for, and the nature of their engagements.
  • Test-drive any public sites they have implemented and take a look at how their own web site has been implemented – How easy is it to find information on their own site?
  • Talk to people at professional meetings or in academic institutions who might have knowledge of system integrators and learn about their relationships, success and failures they have experienced. Talk to those vendors you trust and value that are suppliers of non-software products and find out companies they may have observed or encountered at their other clients. They can be a great source of “intelligence.”
  • Talk to people at their named client sites (non-referred if possible)

Five keys to purposeful and successful selection are carefully evaluating:

  • Fit for your industry and organization: cost, vertical experience, gap completion (providing competencies you lack).
  • Fit with your permanent staff: common communication behaviors, collaborative aptitude, willingness to teach, and share.
  • People who have done something as close to what you need for another organization, and will let you talk to their client before the project begins.
  • A service provider that understands the project, staging, and need for a clear exit goal (being able to clearly define what success will look like at the end of the project before they leave the scene).
  • What we commented on in the first paragraph on jobs for search engine professionals; scout potential service providers’ professional skill set to be sure they have people on their staffs who know more than just writing code.

Armed with these few guidelines as a checklist, you are ready to begin your search for a system integrator and solutions provider that suits your organization.

Helping Enterprise Searchers Succeed

I begin 2012 with a new perspective on enterprise search, one gained as purely an observer. The venues have all been medical establishments with multiple levels of complexity and healthcare workers. As the primary caregiver for a patient, and with some medical training, I take my role as observer and patient advocate quite seriously.

As soon as the patient was on the way to the emergency room, all of his medical records, insurance cards, medications, and contact information were assembled and brought to the hospital. With numerous critical care professionals intervening, and the patient being taken for various tests over several hours, I verbally imparted information I thought was important that might not yet show up in the system. Toward the end of the emergency phase, after being told several times that they had all his records available and “in the system” I relaxed to focus on the “next steps.”

Numerous specialists were involved in the medical conditions and the first three days passed without “a crisis” but little did we know that medication choices were beginning to cause some major problems. Apparently, some parts of the patient’s medical history were not fully considered, and once the medications caused adverse outcomes, all kinds of other problem arose.

Fortunately, I was there to verbally share knowledge that was in the patient’s medical records and get choices of medicine reversed. On several occasions, doctor’s care orders had been “overlooked” and complicating interventions were executed because the healthcare person “in the moment” took an action without “seeing” those orders. I personally watched the extensive recording of doctor’s decisions and confirmed with them changes that were being made to the patient’s care, but repeatedly had to ask why a change was not being implemented.

Observing for six to eight hours on several care floors, I can only say that time is the enemy for medical staff. When questions were raised, the answers were in the system; in other words, “search worked.” What was not available to staff was time to study the whole patient record and understand overlapping and sometimes conflicting orders about care.

It is shortsighted for any institution to believe that it can squeeze professionals to “think-fast,” “on-their-feet” for hours on end with no time to consider the massive amounts of searchable results they are able to assemble. Human beings should not be expected to sacrifice their professional integrity and work standards because their employers have put them in a constant time bind.

My family member had me, but what of patients with no one, or no one versed in medical conditions and processes to intervene. This extends to every line of business where risk is involved from the practice of law to engineering, manufacturing, design, research and development, testing, technical documentation writing, etc.

I don’t minimize how hard it is for businesses and professional services to stay profitable and competitive when they are being pressed to leverage technology for information resource management. However, one measure that every enterprise must embrace is educating its workforce about the use of information technologies it employs. It is not enough to simply make a search engine interface accessible on the workstation. Every worker must be shown how to search for accurate information, authoritative information, and complete information, and be made aware of the ways to ingest and evaluate what they are finding. Finally, they must be given an alternative to getting a more complete chronicle when the results don’t match the need, even if that alternative is to seek another human being instead of a technology.

Search experts are a professionally trained class of workers who can fill the role of trainers, particularly if they have subject matter expertise in the field where search is being deployed. The risks to any enterprise of short-changing workers by not allowing them to fully exploit and understand results produced from search are long-term, but serious.

It is important to leave this entry with recognition that, due to wonderful healthcare professionals and support staff, the outcomes for the patient have been positive. People listened when I had information to share and respected my role in the process. That in no way absolves institutions and enterprises from giving their employees the autonomy and time to pay attention to all the information flooding their sphere of operation. In every field of endeavor, human beings need the time and environment to mindfully absorb, analyze and evaluate all the content available. Technology can aid but cannot carry out thoughtful professional practice.

Making Search Play Well with Content Solutions

In keynote sessions at the recent Gilbane Boston Conference, three speakers in a row made points about content management solutions that are also significant to selection and implementation of enterprise search. Here is a list of paraphrased comments.

  • From Forrester analyst, Stephen Powers were these observations: 1. The promise has been there for years for an E (enterprise)CM suite to do everything but the reality is that no one vendor, even when they have all the pieces, integrates them well. 2. Be cautious about promises from vendors who claim to do it all; instead, focus on those who know how to do integration.
  • Tony Byrne of the Real Story Group observed about Google in the enterprise that they frequently fail because Google doesn’t really understand “how work gets done in the enterprise.”
  • Finally, Scott Liewehr of the Gilbane Group stated that a services firm selection is more important than the content management system application selection.

Taken together these statements may not substantiate the current state of the content management industry but they do point to a trend. Evidence is accruing that products and product suppliers must focus on playing nice together and work for the enterprise. Most tend not to do well, out-of-the-box, without the help of expertise and experts.

Nominally, vendors themselves have a service division to perform this function but the burden falls on the buyer to make the “big” decisions about integration and deployment. The real solution is waiting in the wings and I am increasingly talking to these experts, system integrators. They come in all sizes and configurations; perhaps they don’t even self-identify as system integrators, but what they offer is deep expertise in a number of content software applications, including search.

Generally, the larger the operation the more substantial the number and types of products with which they have experience. They may have expertise in a number of web content management products or e-commerce offerings. A couple of large operations that I have encountered in Gilbane engagements are Avalon Consulting, and Search Technologies, which have divisions each specializing in a facet of content management including search. You need to explore whether their strengths and expertise are a good fit with your needs.

The smaller companies specialize, such as working with several search engines plus tools to improve metadata and vocabulary management so content is more findable. Specialists in enterprise search must still have an understanding of content management systems because those are usually the source of metadata that feed high quality search. I’ve recently spoken with several small service providers whose commentaries and case work illustrate a solid and practical approach. Those you might want to look into are: Applied Relevance, Contegra Systems, Findwise, KAPS Group, Lucid Imagination, New Idea Engineering, and TNR Global.

Each of these companies has a specialty and niche, and I am not making explicit recommendations. The simple reason is that what you need and what you are already working on is unique to your enterprise. Without knowledge of your resources, special needs and goals my recommendations would be guesses. What I am sharing is the idea that you need experts who can give value when they are the right experts for your requirements.

The guidance here is to choose a search services firm that will move you efficiently and effectively along the path of systems integration. Expertise is available and you do not need to struggle alone knitting together best-of-breed components. Do your research and understand the differentiators among the companies. High touch, high integrity and commitment for the long haul should be high on your list of requirements – and of course, look for experience and expertise in deploying the technology solutions you want to use and integrate.

Next month I’ll share some tips on evaluating possible service organizations starting with techniques for doing research on the Web.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑