Curated for content, computing, data, information, and digital experience professionals

Category: Enterprise search & search technology (Page 27 of 61)

Research, analysis, and news about enterprise search and search markets, technologies, practices, and strategies, such as semantic search, intranet collaboration and workplace, ecommerce and other applications.

Before we consolidated our blogs, industry veteran Lynda Moulton authored our popular enterprise search blog. This category includes all her posts and other enterprise search news and analysis. Lynda’s loyal readers can find all of Lynda’s posts collected here.

For older, long form reports, papers, and research on these topics see our Resources page.

Searching Email in the Enterprise

Last week I wrote about “personalized search” and then a chance encounter at a meeting triggered a new awareness of business behavior that makes my own personalized search a lot different than might work for others. A fellow introduced himself to me as the founder of a start-up with a product for searching email. He explained that countless nuggets of valuable information reside in email and will never be found without a product like the one his company had developed. I asked if it only retrieved emails that were resident in an email application like Outlook; he looked confused and said “yes.” I commented that I leave very little content in my email application but instead save anything with information of value in the appropriate file folders with other documents of different formats on the same topic. If an attachment is substantive, I may create a record with more metadata in my content management database so that I can use the application search engine to find information germane to projects I work on. He walked away with no comment, so I have no idea what he was thinking.

It did start me thinking about the realities of how individuals dispose of, store, categorize and manage their work related documents. My own process goes like this. My work content falls into four broad categories: products and vendors, client organizations and business contacts, topics of interest, and local infrastructure related materials. When material is not purposed for a particular project or client but may be useful for a future activity, it gets a metadata record in the database and is hyperlinked to the full-text. The same goes for useful content out on the Web.

When it comes to email, I discipline myself to dispose of all email into its appropriate folder as soon as I can. Sometimes this involves two emails, the original and my response. When the format is important I save it in the *.mht format (it used to be *.htm until I switched to Office 2007 and realized that doing so created a folder for every file saved); otherwise, I save content in *.txt format. I rename every email to include a meaningful description including topic, sender and date so that I can identify the appropriate email when viewing a folder. If there is an attachment it also gets an appropriate title and date, is stored in its native format and the associated email has “cover” in the file name; this helps associate the email and attachment. The only email that is saved in Outlook in personal folders is current activity where lots of back and forth is likely to occur until a project is concluded. Then it gets disposed of by deleting, or with the project file folders as described above. This is personal governance that takes work. Sometimes I hit a wall and fall behind on the filtering and disposing but I keep at it because it pays off in the long term.

So, why not relax and leave it all in Outlook, then let a search engine do the retrieval? Experience had revealed that most emails are labeled so poorly by senders and the content is so cryptic that to expect a search engine to retrieve it in a particular context or with the correct relevance would be impossible. I know this from the experience of having to preview dozens of emails stored in folders for projects that are active. I have decided to give myself the peace of mind that when the crunch is on, and I really need to go to that vendor file and retrieve what they sent me in March of last year, I can get it quickly in a way that no search engine could ever do. Do you realize how much correspondence you receive from business contacts using their “gmail” account with no contact information revealing their organization in the body and signed with a nickname like “Bob” and messages “like we’re releasing the new version in four weeks” or that just have a link to an important article on the web with “thought this would interest you?”

I did not have a chance to learn if my new business acquaintance had any sense of the amount of competition he has out there for email search, or what his differentiator is that makes a compelling case for a search product that only searches through email, or what happens to his product when Microsoft finally gets FAST search bundled to work with all Office products. OR, perhaps the rest of the world is storing all content in Outlook. Is this true? If so, he may have a winner.

Personalized Search in the Enterprise

This is an interesting topic for two reasons: there is enormous diversity in the ways we all think and go about finding content; personalizing a search interface without being intrusive is extremely difficult. Any technology that requires us to do activities according to someone else’s design, which bends our natural inclination, is by definition not going to be personal.

This topic comes to mind because of two unrelated pieces of content I read in the past 24 hours. The first was an email asking me about personal information management and automated tagging, and the second was an interview I read with Mike Moran, a thought leader in search and speaker at one of our Gilbane Conferences. In the interview, Mike talks about personalized search. Then Information Week referenced search personalization in an article about a patent suit against Google.

Here is my take on the many personalized search themes that have recently emerged. From dashboards to customizing results, options to focus on particular topics or types of content, socialized search to support interacting with and sharing results, to retrieving content we personally created or received (email), content we used or were named in, all might be referred to as search personalization. Getting each to work well will enhance enterprise search but….

Knowing how transient and transformative our thoughts and behaviors really are, we should focus realistically on the complexity of producing software tools and services that satisfy and enhance personal findability. We are ambiguous beings, seeking structured equilibrium in many of our activities to create efficiency and reduce anxiety, while desiring new, better, quicker and smarter devices to excite and engage us. Once we achieve a level of comfort with a method or mechanism, whether quickly or over time, we evolve and seek change. But, when change is imposed on an unprepared mind, our emotions probably override any real benefit that might be gained in productivity. Then we tend to self-sabotage the potential for operational usefulness when an uncomfortable process intrudes. Mental lack of preparedness undermines our work when a new design demands a behavioral shift that lacks connection to our current state or past experiences. How often are we just not in a frame of mind to take on something totally alien, especially with deadlines looming?

Look at the single most successful aspect of Google, minimalism in its interface. One did not need to wade through massively dense graphics scrambled with text in disordered layouts to figure out what to do when Google first appeared. The focus was immediately obvious.

I am presenting this challenge to vendors; there is a need to satisfy a huge array of personal preferences while introducing a minimal amount of change in any one release. Easy adoption requires that new products be simple. Usefulness must be quickly obvious to multiple audiences.

I am presenting this challenge to technology users; focus your appetite. Decide before shopping or adopting new tools what would bring the most immediate productivity gain and personal adoptability for maximum efficiency. Think about how defeated you feel when approaching a new release of an upgraded product that has added so many new “bells and whistles” that you are consumed with trying to rediscover all the old functions and features that gave your workflow a comfortable structure. Think carefully about how much learning and re-adjusting will be needed if you decide on technology that promises to do everything, with unlimited personalization. It may be possible, but does it really feel personally acceptable.

Semantic Search has Its Best Chance for Successes in the Enterprise

I am expecting significant growth in the semantic search market over the next five years with most of it focused on enterprise search. The reasons are pretty straightforward:

  • Semantic search is very hard and to scale it to the Web compounds the complexity.
  • Because the semantic Web is so elusive and results have been spotty with not much traction, it will be some time before it can be easily monetized.
  • Like many things that are highly complex, a good model will be to break the challenge of semantic search into smaller targeted business problems where focus is on a particular audience seeking content from a narrower domain.

I base this predication on my observation of the on-going struggle for organizations to get a strong framework in place to manage content effectively. By effectively I mean, establishing solid metadata, governance and publishing protocols that ensure that the best information knowledge workers produce is placed in range for indexing and retrieval. Sustained discipline and the people to exercise it just aren’t being employed in many enterprises to make this happen in a cohesive and comprehensive fashion. I have been discouraged by the number of well-intentioned projects I have seen flounder because organizations just can’t commit long-term or permanent human resources to the activity of content governance. Sometimes it is just on-again-off-again. What enterprises need are people with deep knowledge about the organization and how its content fits together in a logical framework for all types of knowledge workers. Instead, organizations tend to assign this job to external consultants or low-level staffers who are not well-grounded in the work of the particular enterprise. The results are predictably disappointing.

Enter semantic search technologies where there are multiple algorithmic tools available to index and retrieve content for complex and multi-faceted queries. Specialized semantic technologies are often well suited to shorter term projects for which domain specific vocabularies can be built more quickly with good results. Maintaining targeted vocabulary ontologies for a focused topic can be done with fewer human resources and a carefully bounded ontology can become an intelligent feed to a semantic search engine, helping it index with better precision and relevance.

This scenario is proposed with one caveat; enterprises must commit to having very smart people with enterprise expertise to build the ontology. Having a consultant coach the subject matter expert in method, process and maintenance guidelines for doing so is not a bad idea but the consultant has to prepare the enterprise for sustainability after exiting the scene.

The wager here is that enterprises can ramp up semantic search with a series of short, targeted projects, each of which establishes a goal of solving one business problem at a time and committing to efficient and accurate content retrieval as part of the solution. By learning what works well in each situation, intranet web retrieval will improve systematically and thoughtfully. The ramp to a better semantic Web will be paved with these interlocking pieces.

Keep an eye on these companies to provide technologies for point solutions in business critical applications: Basis Technology, Cognition Technology, Connotate, Expert Systems, Lexalytics, Linguamatics, Metatomix, Semantra, Sinequa and Temis.

It Takes Work to Get Good-to-Great Enterprise Search

It takes patience, knowledge and analysis to tell when search is really working. For the past few years I have seen a trend away from doing any “dog work” to get search solutions tweaked and tuned to ensure compliance with genuine business needs. People get cut, budgets get sliced and projects dumped because (fill the excuse) and the message gets promoted “enterprise search doesn’t work.” Here’s the secret, when enterprise search doesn’t work the chances are it’s because people aren’t working on what needs to be done. Everyone is looking for a quick fix, short cut, “no thinking required” solution.

This plays out in countless variations but the bottom line is that impatience with human processing time and the assumption that a search engine “ought to be able to” solve this problem without human intervention cripple possibilities for success faster than anything else.

It is time for search implementation teams to get realistic about the tasks that must be executed and milestones to be reached. Teams must know how they are going to measure success and reliability, then to stick with it, demanding that everyone agrees on the requirements before throwing the towel in at the first executive anecdote that the “dang thing doesn’t work.”

There are a lot of steps to getting even an out-of-the-box solution working well. But none is more important than paying attention to these:

  • Know your content
  • Know your search audience
  • Know what needs to be found and how it will be looked for
  • Know what is not being found that should be

The operative verb here is to know and to really know anything takes work, brain work, iterative, analytical and thoughtful work. When I see these reactions from IT upon setting off a search query that returns any results: “we’re done” OR “no error messages, good” OR “all these returns satisfy the query,” my reaction is:

  • How do you know the search engine was really looking in all the places it should?
  • What would your search audience be likely to look for and how would they look?
  • Who is checking to make sure these questions are being answered correctly
  • How do you know if the results are complete and comprehensive?

It is the last question that takes digging and perseverance. It is pretty simple to look at search results and see content that should not have been retrieved and figure out why it was. Then you can tune to make sure it does not happen again.

To make sure you didn’t miss something takes systematic “dog work” and you have to know the content. This means starting with a small body of content that it is possible for you to know, thoroughly. Begin with content representative of what your most valued search audience would want to find. Presumably, you have identified these people through establishing a clear business case for enterprise search. (This is not something for the IT department to do but for the business team that is vested in having search work for their goals.) Get these “alpha worker” searchers to show you how they would go about trying to find the stuff they need to get their work done every day, to share with you some of what they consider some of the most valuable documents they have worked with over the past few years. (Yes, years – you need to work with veterans of the organization whose value is well established, as well as with legacy content that is still valuable.)

Confirm that these seminal documents are in the path of the search engine for the index build; see what is retrieved when they are searched for by the seekers. Keep verifying by looking at both content and results to be sure that nothing is coming back that shouldn’t and that nothing is being missed. Then double the content with documents on similar topics that were not given to you by the searchers, even material that they likely would never have seen that might be formatted very differently, written by different authors, and more variable in type and size but still relevant. Re-run the exact searches that were done originally and see what is retrieved. Repeat in scaling increments and validate at every point. When you reach points where content is missing from results that should have been found using the searcher’s method, analyze, adjust, and repeat.

A recent project revealed to me how willing testers are to accept mediocre results when it became apparent how closely content must be scrutinized and peeled back to determine its relevance. They had no time for that and did not care how bad the results were because they had a pre-defined deadline. Adjustments may call for refinements in the query formulation that might require an API to make it more explicit, or the addition of better category metadata with rich cross-references to cover vocabulary variations. Too often this type of implementation discovery signals a reason to shut down the project because all options require human resources and more time. Before you begin, know that this level of scrutiny will be necessary to deliver good-to-great results; set that expectation for your team and management, so it will be acceptable to them when adjustments are needed for more work to be done to get it right. Just don’t blame it on the search engine – get to work, analyze and fix the problem. Only then can you let search loose on your top target audience.

Lucid Imagination and ISYS Partner on Lucene/Solr

Lucid Imagination and ISYS Search Software announced a strategic partnership. The agreement enables Lucid Imagination to provide solutions that combine its core Lucene and Solr expertise with the ISYS File Readers document filtering technology. The flexibility of the architecture allows enterprises to develop sophisticated purpose-built search solutions. By offering ISYS File Readers as part of its Lucene/Solr solutions, Lucid Imagination gives users and developers out-of-the-box capability to find and extract virtually all of the content and formats that exist in their enterprise environment. Lucid Imagination Web site serves as a knowledge portal for the Lucene community, with wide range of information, resources and  information retrieval application, LucidFind to help developers and search professionals get access to the information they need to design, build and deploy Lucene and Solr based solutions. http://www.lucidimagination.com

MuseGlobal and Specialty Systems Partner

MuseGlobal announced a partnership with Specialty Systems, Inc., a company focusing on innovative information systems solutions to Federal, State and Local Government customers. Specialty Systems, Inc. is partnering with MuseGlobal to provide the systems integration expertise to engineer law enforcement and homeland security applications built on MuseGlobal’s MuseConnect, which provides federated search and harvesting technologies, with a library of more than 6,000 pre-built source connectors. The applications resulting from this partnership will incorporate unified information access allowing structured data from database sources; semi-structured data from spreadsheets, forms and XML sources; unstructured data from web sites, documents, email; and rich media such as images, video and audio information to be accessed simultaneously from internal databases and external sources.  This information is gathered on the fly, and unified for immediate presentation to the requestor. http://www.specialtysystems.com, http://www.museglobal.com

SDL Tridion Integrates Q-go Natural Language Search into Web Content Management

SDL Tridion announced that it has partnered with Q-go to provide an integrated Natural Language Search engine within SDL Tridion’s web content management platform. The solution provides the online search environment within websites only targeted and relevant search results. Q-go’s Natural Language Search is now accessible from within the SDL Tridion web content management environment. Content editors are able to create model questions in the Q-go component of the SDL Tridion platform. This means that the most common questions pertaining to products and the website itself can be targeted and answered by web content editors, creating streamlined content and vastly increased relevance of searches. The integration also means that only one interface is needed to update the entire website, which can be done anywhere, anytime. You can find more information on the integration at the eXtensions Community of  http://www.sdltridionworld.com

Paying Attention to Enterprise Search Results

When thinking about some enterprise search use cases that require planning and implementation, presentation of search results is not often high on the list of design considerations. Learning about a new layer of software called Documill from CEO and founder, Mika Könnölä, caused me to reflect on possible applications in which his software would be a benefit.

There is one aspect of search output (results) that always makes an impression when I search. Sometimes the display is clear and obvious and other times the first thing that pops into my mind is “what the heck am I looking at” or “why did this stuff appear?” In most cases, no matter how relevant the content may end up being to my query, I usually have to plow through a lot (could be dozens) of content pieces to confirm the validity or usefulness of what is retrieved.

Admittedly, much of my searching is research or helping with a client’s intranet implementation, not just looking for a quick answer, a fact or specific document. When I am in the mode for what I call “quick and dirty” search, I can almost always frame the search statement to get the exact result I want very quickly. But when I am trying to learn about a topic new to me, broaden my understanding or collect an exhaustive corpus of material for research, sifting and validating dozens of documents by opening each and then searching within the text for the piece of the content that satisfied the query is both tedious and annoyingly slow.

That is where Documill could enrich my experience considerably for it can be layered on any number of enterprise search engines to present results in the form of precise thumbnails that show where in a document the query criterion/criteria is located. In their own words, “it enhances traditional search engine result list with graphically accurate presentation of the content.”

Here are some ideas for its application:

  • In an application developed to find specific documents from among thousands that are very similar (e.g. invoices, engineering specifications), wouldn’t it be great to see only a dozen, already opened, pages to the correct location where the data matches the query?
  • In an application of 10s of thousands of legacy documents, OCRed for metadata extraction displayable as PDFs, wouldn’t it be great to have the exact pages of the document that match the search displayed as visual images opened to read in the results page? This is especially important in technical documents of 60-100 pages where the target content might be on page 30 or 50.
  • In federated search output, when results may contain many similar documents, the immediate display of just the right pages as images ready for review will be a time-saving blessing.
  • In a situation where a large corpus of content contains photographs or graphics, such as newspaper archives, scientific and engineering drawings, an instantaneous visual of the content will sharpen access to just the right documents.

I highly recommend that you ask your search engine solution provider about incorporating Documill into your enterprise search architecture. And, if you have, please share your experiences with me through comments to this post or by reaching out for a conversation.

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑