Types of Search: July 2009 Archives

Last week I wrote about "personalized search" and then a chance encounter at a meeting triggered a new awareness of business behavior that makes my own personalized search a lot different than might work for others. A fellow introduced himself to me as the founder of a start-up with a product for searching email. He explained that countless nuggets of valuable information reside in email and will never be found without a product like the one his company had developed. I asked if it only retrieved emails that were resident in an email application like Outlook; he looked confused and said "yes." I commented that I leave very little content in my email application but instead save anything with information of value in the appropriate file folders with other documents of different formats on the same topic. If an attachment is substantive, I may create a record with more metadata in my content management database so that I can use the application search engine to find information germane to projects I work on. He walked away with no comment, so I have no idea what he was thinking.

It did start me thinking about the realities of how individuals dispose of, store, categorize and manage their work related documents. My own process goes like this. My work content falls into four broad categories: products and vendors, client organizations and business contacts, topics of interest, and local infrastructure related materials. When material is not purposed for a particular project or client but may be useful for a future activity, it gets a metadata record in the database and is hyperlinked to the full-text. The same goes for useful content out on the Web.

When it comes to email, I discipline myself to dispose of all email into its appropriate folder as soon as I can. Sometimes this involves two emails, the original and my response. When the format is important I save it in the *.mht format (it used to be *.htm until I switched to Office 2007 and realized that doing so created a folder for every file saved); otherwise, I save content in *.txt format. I rename every email to include a meaningful description including topic, sender and date so that I can identify the appropriate email when viewing a folder. If there is an attachment it also gets an appropriate title and date, is stored in its native format and the associated email has "cover" in the file name; this helps associate the email and attachment. The only email that is saved in Outlook in personal folders is current activity where lots of back and forth is likely to occur until a project is concluded. Then it gets disposed of by deleting, or with the project file folders as described above. This is personal governance that takes work. Sometimes I hit a wall and fall behind on the filtering and disposing but I keep at it because it pays off in the long term.

So, why not relax and leave it all in Outlook, then let a search engine do the retrieval? Experience had revealed that most emails are labeled so poorly by senders and the content is so cryptic that to expect a search engine to retrieve it in a particular context or with the correct relevance would be impossible. I know this from the experience of having to preview dozens of emails stored in folders for projects that are active. I have decided to give myself the peace of mind that when the crunch is on, and I really need to go to that vendor file and retrieve what they sent me in March of last year, I can get it quickly in a way that no search engine could ever do. Do you realize how much correspondence you receive from business contacts using their "gmail" account with no contact information revealing their organization in the body and signed with a nickname like "Bob" and messages "like we're releasing the new version in four weeks" or that just have a link to an important article on the web with "thought this would interest you?"

I did not have a chance to learn if my new business acquaintance had any sense of the amount of competition he has out there for email search, or what his differentiator is that makes a compelling case for a search product that only searches through email, or what happens to his product when Microsoft finally gets FAST search bundled to work with all Office products. OR, perhaps the rest of the world is storing all content in Outlook. Is this true? If so, he may have a winner.

This is an interesting topic for two reasons: there is enormous diversity in the ways we all think and go about finding content; personalizing a search interface without being intrusive is extremely difficult. Any technology that requires us to do activities according to someone else's design, which bends our natural inclination, is by definition not going to be personal.

This topic comes to mind because of two unrelated pieces of content I read in the past 24 hours. The first was an email asking me about personal information management and automated tagging, and the second was an interview I read with Mike Moran, a thought leader in search and speaker at one of our Gilbane Conferences. In the interview, Mike talks about personalized search. Then Information Week referenced search personalization in an article about a patent suit against Google.

Here is my take on the many personalized search themes that have recently emerged. From dashboards to customizing results, options to focus on particular topics or types of content, socialized search to support interacting with and sharing results, to retrieving content we personally created or received (email), content we used or were named in, all might be referred to as search personalization. Getting each to work well will enhance enterprise search but....

Knowing how transient and transformative our thoughts and behaviors really are, we should focus realistically on the complexity of producing software tools and services that satisfy and enhance personal findability. We are ambiguous beings, seeking structured equilibrium in many of our activities to create efficiency and reduce anxiety, while desiring new, better, quicker and smarter devices to excite and engage us. Once we achieve a level of comfort with a method or mechanism, whether quickly or over time, we evolve and seek change. But, when change is imposed on an unprepared mind, our emotions probably override any real benefit that might be gained in productivity. Then we tend to self-sabotage the potential for operational usefulness when an uncomfortable process intrudes. Mental lack of preparedness undermines our work when a new design demands a behavioral shift that lacks connection to our current state or past experiences. How often are we just not in a frame of mind to take on something totally alien, especially with deadlines looming?

Look at the single most successful aspect of Google, minimalism in its interface. One did not need to wade through massively dense graphics scrambled with text in disordered layouts to figure out what to do when Google first appeared. The focus was immediately obvious.

I am presenting this challenge to vendors; there is a need to satisfy a huge array of personal preferences while introducing a minimal amount of change in any one release. Easy adoption requires that new products be simple. Usefulness must be quickly obvious to multiple audiences.

I am presenting this challenge to technology users; focus your appetite. Decide before shopping or adopting new tools what would bring the most immediate productivity gain and personal adoptability for maximum efficiency. Think about how defeated you feel when approaching a new release of an upgraded product that has added so many new "bells and whistles" that you are consumed with trying to rediscover all the old functions and features that gave your workflow a comfortable structure. Think carefully about how much learning and re-adjusting will be needed if you decide on technology that promises to do everything, with unlimited personalization. It may be possible, but does it really feel personally acceptable.

I am expecting significant growth in the semantic search market over the next five years with most of it focused on enterprise search. The reasons are pretty straightforward:
• Semantic search is very hard and to scale it to the Web compounds the complexity.
• Because the semantic Web is so elusive and results have been spotty with not much traction, it will be some time before it can be easily monetized.
• Like many things that are highly complex, a good model will be to break the challenge of semantic search into smaller targeted business problems where focus is on a particular audience seeking content from a narrower domain.

I base this predication on my observation of the on-going struggle for organizations to get a strong framework in place to manage content effectively. By effectively I mean, establishing solid metadata, governance and publishing protocols that ensure that the best information knowledge workers produce is placed in range for indexing and retrieval. Sustained discipline and the people to exercise it just aren't being employed in many enterprises to make this happen in a cohesive and comprehensive fashion. I have been discouraged by the number of well-intentioned projects I have seen flounder because organizations just can't commit long-term or permanent human resources to the activity of content governance. Sometimes it is just on-again-off-again. What enterprises need are people with deep knowledge about the organization and how its content fits together in a logical framework for all types of knowledge workers. Instead, organizations tend to assign this job to external consultants or low-level staffers who are not well-grounded in the work of the particular enterprise. The results are predictably disappointing.

Enter semantic search technologies where there are multiple algorithmic tools available to index and retrieve content for complex and multi-faceted queries. Specialized semantic technologies are often well suited to shorter term projects for which domain specific vocabularies can be built more quickly with good results. Maintaining targeted vocabulary ontologies for a focused topic can be done with fewer human resources and a carefully bounded ontology can become an intelligent feed to a semantic search engine, helping it index with better precision and relevance.

This scenario is proposed with one caveat; enterprises must commit to having very smart people with enterprise expertise to build the ontology. Having a consultant coach the subject matter expert in method, process and maintenance guidelines for doing so is not a bad idea but the consultant has to prepare the enterprise for sustainability after exiting the scene.

The wager here is that enterprises can ramp up semantic search with a series of short, targeted projects, each of which establishes a goal of solving one business problem at a time and committing to efficient and accurate content retrieval as part of the solution. By learning what works well in each situation, intranet web retrieval will improve systematically and thoughtfully. The ramp to a better semantic Web will be paved with these interlocking pieces.

Keep an eye on these companies to provide technologies for point solutions in business critical applications: Basis Technology, Cognition Technology, Connotate, Expert Systems, Lexalytics, Linguamatics, Metatomix, Semantra, Sinequa and Temis.

NewsShark

Sign-up for our weekly NewsShark newsletter.
Content technology industry news without the hype:

* Email

* First Name

* Last Name

* = Required Field