Curated content for content, computing, and digital experience professionsals

Author: Lynda Moulton (Page 2 of 18)

Search: a Term for the Long Haul, But…

There is no question that language influences marketing success; positioning software products has been a game of out-shining competitors with clever slogans and crafty coined terminology. Having been engaged with search technologies since 1974, and as the architect of a software application for enterprise content indexing and retrieval, I’ve observed how product positioning has played out in the enterprise search market over the years. When there is a new call for re-labeling “search,” the noun defining software designed for retrieving electronic content, I reflect on why and whether a different term would suffice.

Here is why a new term is not needed and the reasons why. For the definition of software algorithms that are the underpinning of finding and retrieving electronic content, regardless of native format, the noun search is efficient, to-the-point, unambiguous and direct.

We need a term that covers this category of software that will stand the test of time, as has automobile, which originated after terms too numerous to fully list had been tested: horseless buggy, self-contained power plant, car, motor vehicle, motor buggy, road engine, steam-powered wheeled vehicles, electric carriage, and motor wagon to name a few. Finally a term defined as a self-powered vehicle, was coined, “automobile.” It covered all types of self-powered “cars,” not just those pulled by another form of locomotive as is a rail car. Like the term “search,” automobiles are often qualified by modifiers, such as “electric,” “hybrid” or “sedan” versus “station wagon.” Search may be coupled with “Web” versus “Enterprise,” or “embedded” versus “stand-alone.” In the field of software technology we need and generally understand the distinctions.

So, I continue to be mystified by rhetoric that demands a new label but I am willing to concede where we need to be more precise, and that may be what the crowd is really saying. When and where the term is applied deserves reconsideration. Technologists who build and customize search software should be able to continue with the long established lingo, but marketers and conferences or meetings to educate a great variety of search users could probably do a better job of expressing what is available to non-techies. As one speaker at Enterprise Search Europe 2013 (ESEu2013) stated and others affirmed, “search” is not a project and to that I will add, nor is it a single product. Instead it is core to a very large and diverse range of products.

Packaging Software that includes Search Technology

Vendors are obviously aware of where they need to be marketing and the need to package for their target audience. There are three key elements that have contributed to ambiguity and resulted in a lethargic reaction in the so-called enterprise search marketplace in recent years: overly complex and diffuse categorization, poor product labeling and definition, and usability and product interface design that does not reflect an understanding of the true audience for a product. What can be done to mitigate confusion?

  1. Categorizing what is being offered has to speak to the buyer and potential user. When a single product is pitched to a dozen different market categories (text mining, analytics, content management, metadata management, enterprise search, big data management, etc.) buyers are skeptical and wary of all-in-one claims. While there are software packages that incorporate many or elements of a variety of software applications, diffusion ends up fracturing the buying audience into such minute numbers that a vendor does not gain real traction across the different types of needs. Recommendation: a product must be categorized to its greatest technical strengths and the largest audience to which it will appeal. The goal is to be a strong presence in the specific marketplaces where those buyers go to seek products. When a product has outstanding capabilities for that audience, buyers will be delighted to also find additional ancillary functions and features that are already built in.
  2. Software that is built on search algorithms or that embeds search must be packaged with labeling that pays attention to a functional domain and the target audience. Clear messaging that speaks to the defined audience is the wrapper for the product. It must state what and why you have a presence in this marketplace, the role the product plays and the professional functions that will benefit from its use. Messaging is how you let the audience know that you have created tools for them.
  3. Product design requires a deep understanding of professional users and their modes of pursuing business goals. At ESEu2013 several presentations and one workshop focused on usability and design; speakers all shared a deep understanding of differences across professional users. They recognized behavioral, cultural, geographic and mode preferences as key considerations without stating explicitly that different professional groups each work in unique ways. I assert that this is where so many applications break-down in design and implementation. Workflow design, look-and-feel, and product features must be very different for someone in accounting or finance versus an engineer or attorney. Highly successful software applications are generally initiated and development is sustained by professionals who need these tools to do their work, their way. Without deep professional knowledge embedded in product design teams, products often miss the market’s demands. Professionals bring know-how, methods and practices to their jobs and it is not the role of software developers to change the way they go about their business by forcing new models that are counter to what is intuitive in a market segment.

Attention to better software definition leads to the next topic.

Conference and meeting themes: Search technology versus business problems to be solved

Attention to conference and meeting content was the reason for this post. Having given an argument for keeping the noun search in our vocabulary, I have also acknowledged that it is probably a failed market strategy to label and attach messaging to every software product with search as either, enterprise search or web search. Because search is everywhere in almost every software application, we need conferences with exhibits that target more differentiated (and selective) audiences.

The days of generic all-in-one meetings like AIIM, the former National Online Meeting (Information Today’s original conference), E2, and so on may have run their course. As a failed conference attendee, my attention span lasts for about one hour maximum, and results in me listening to no more than a half dozen exhibitor pitches before I become a wandering zombie, interested in nothing in particular because there is nothing specific to be drawn to at these mega-conferences.

I am proposing a return to professionally oriented programs that focus on audience and business needs. ESEu2013 had among its largest cohort, developers and software implementers. There were few potential users, buyers, content or metadata managers, or professional search experts but these groups seek a place to learn about products without slides showing snippets of programming code. There is still a need for meetings that include the technologists but it is difficult to attract them to a meeting that only offers programming sessions for users, the people for whom they will develop products. How do we get them into a dialogue with the very people for whom they are developing and designing products? How can vendors exhibit and communicate their capabilities for solving a professional problem when their target professional audience is not in the room.

At Enterprise Search Europe 2013, the sessions were both diverse and enlightening but, as I noted at the conference wrap-up, each track spoke to a unique set of enterprise needs and variety of professional interests. The underlying technology, search, was the common thread and yet each track might have been presented in a totally different meeting environment. One topic, Big Data, presents challenges that need explaining and information seekers come to learn about products for effectively leveraging it in a number of enterprise environments. These cases need to be understood as business problems, which call for unique software applications not just some generic search technology. Big data can and is already being offered as a theme for an entire conference where the emphasis on aspects of search technology is included. As previously noted topics related to big data problems vary: data and text mining, analytics, semantic processing aka natural language processing, and federation. However, data and text mining for finance has a totally different contextual relevance than for scientists engaged in genomics or targeted drug therapy research, and each audience looks for solutions in its field.

So, let’s rethink what each meeting is about, who needs to be in the room for each business category, what products are clearly packaged for the audience and the need, and schedule programs that bring developers, implementers, buyers and users into a forum around specially packaged software applications for meaningful dialogue. All of this is said with sincere respect for my colleagues who have suggested terms that range from “beyond search” to “discovery” and “findability” as alternative to “search. Maybe the predominant theme of the next Enterprise Search conference should be Information Seeking: Needs, Behaviors and Applications with tracks organized accordingly.

[NOTE: Enterprise Search Europe had excellent sessions and practical guidance. Having given a “top of mind” reaction to what we need to gain a more diverse audience in the future, my next post will be a litany of the best observations, recommendations and insights from the speakers.]

Leveraging Search in Small Enterprises

A mantra for a small firm or start-up in the 1970s when “Big Blue” was the standard for top notch sales and selling was we need to out-IBM the IBMers.

Search is just one aspect of being able to find what you need to leverage knowledge assets in your work, whether you are in a small firm, a part of a small group in a large organization or an individual consultant seeking to maximize the masses of content and information surrounding you in work.

My thoughts are inspired by the question asked by Andreas Gruber of Informations und Wissensmanagement in this recent post on Enterprise Search Engine Professionals, LinkedIn group. He posed a request for information stating: For enterprise search solutions for (very) small enterprises (10 to 200 employees), I find it hard to define success factors and it seems, that there are not many examples available. If you follow e.g. the critical success factors from the Martin White’s Enterprise Search book, most of them doesn’t seem to work for a small company – simply because none of them can/will investment in a search team etc.

The upcoming Enterprise Search Europe meeting (May 14-16, 2013) in London is one focus of my attention at present. Since Martin White is the Chairman and principal organizer, Andreas’ comments resonated immediately. Concurrently, I am working on a project for a university department, which probably falls in the category of “small enterprise”. The other relevant project on my desk is a book I am co-authoring on “practical KM” and we certainly aim to appeal to the individual practitioner or groups limited by capital resources. These areas of focus challenge me to respond to Andreas’ comments because I am certain they are top of mind for many and the excellent comments already at the posting show that others have good ideas about the topic, as well.

Intangible capital is particularly significant in many small firms, academia, and for independent consultants, like me. Intensive leveraging of knowledge in the form of expertise, relationships, and processes is imperative in these domains. Intangible capital, as a percent of most businesses currently surpasses tangible capital in value, according to Mary Adams founder of Smarter-Companies. Because intangible capital takes more thought and effort to identify, find or aggregate than hard assets, tools are needed to uncover, discover and pinpoint it.

Let’s take the example of expertise, an indisputable intangible asset of any professional services. For any firm, asking expert staff to put an explicit value on their knowledge, competencies or acumen for tackling the type of problem that you need to have solved may give you a sense of value but you need more. The firm or professional you want to hire must be able to back up its value by providing explicit evidence that they “know their stuff” and can produce. For you, search is a tool to lead you to public or published evidence. For the firm being asked to bid on your work, you want them to be able to produce additional evidence. Top quality firms do put both human and technology search resources to work to service existing projects and clients, and to provide evidence of their qualifications, when being asked to retrieve relevant work or references. Search tools and content management methods are diverse and range from modest to very expensive in scope but no firm can exist for long without technology to support the findability of its intangible capital.

To summarize, there are three principal ways that search pays off in the small-medium business (SMB) sector. Citing a few examples of each they are:

  • Finding expertise (people): potential client engagement principal or team member, answers to questions to fulfill a clients engagement, spurring development or an innovation initiative
  • Retrieving prior work: reuse of know-how in new engagements, discovery of ideas previously tabled, learning, documentation of products and processes, building a proposal, starting point for new work, protecting intellectual property for leverage, when patenting, or participating in mergers and acquisitions.
  • Creating the framework for efficiency: time and speed, reinforcing what you know, supporting PR, communications, knowledge base, portraying the scope of intellectual capital (if you are a target for acquisition), the extent of your partnerships that can expand your ability to deliver, creating new offerings (services) or products.

So, to conclude my comment on Andreas’ posting, I would assert that you can “out-IBM the IBMers” or any other large organization by employing search to leverage your knowledge, people and relationships in smart and efficient ways. Excellent content and search practices can probably reduce your total human overhead because even one or two content and search specialists plus the right technology can deliver significant efficiency in intangible asset utilization.

I hope to see conference attendees who come from that SMB community so we can continue this excellent discussion in London, next month. Ask me about how we “ate our own dog-food” (search tools) when I owned a small software firm in the early 1980s. The overhead was minimal compared to the savings in support headcount.

E-discovering Language to Launch Your Taxonomy

New enterprise initiatives, whether for implementing search solutions or beginning a new product development program, demand communication among team leaders and participants. Language matters; defining terminology for common parlance is essential to smooth progress toward initiative objectives.

Glossaries, dictionaries, taxonomies, thesauri and ontologies are all mechanisms we use routinely in education and work to clarify terms we use to engage and communicate understanding of any specialized domain. Electronic social communication added to the traditional mix of shared information (e.g. documents, databases, spreadsheets, drawings, standardized forms) makes business transactional language more complex. Couple this with the use of personal devices for capturing and storing our work content, notes, writings, correspondence, design and diagram materials and we all become content categorizing managers. Some of us are better than others at organizing and curating our piles of information resources.

As recent brain studies reveal, humans, and probably any animal with a brain, have established cognitive areas in our brains with pathways and relationships among categories of grouped concepts. This reinforces our propensity for expending thought and effort to order all aspects of our lives. That we all organize differently across a huge spectrum of concepts and objects makes it wondrous that we can live and work collaboratively at all. Why after 30+ years of marriage do I arrange my kitchen gadget drawer according to use or purpose of devices while my husband attempts to store the same items according to size and shape? Why do icons and graphics placed in strange locations in software applications and web pages rarely impart meaning and use to me, while others “get it” and adapt immediately?

The previous paragraph may seem to be a pointless digression from the subject of the post but there are two points to be made here. First, we all organize both objects and information to facilitate how we navigate life, including work. Without organization that is somehow rationalized, and established accordingly to our own rules for functioning, our lives descend into dysfunctional chaos. People who don’t organize well or struggle with organizing consistently struggle in school, work and life skills. Second, diversity of practice in organizing is a challenge for working and living with others when we need to share the same spaces and work objectives. This brings me to the very challenging task of organizing information for a website, a discrete business project, or an entire enterprise, especially when a diverse group of participants are engaged as a team.

So, let me make a few bold suggestions about where to begin with your team:

  • Establish categories of inquiry based on the existing culture of your organization and vertical industry. Avoid being inventive, clever or idiosyncratic. Find categories labels that everyone understands similarly.
  • Agree on common behaviors and practices for finding by sharing openly the ways in which members of the team need to find, the kinds of information and answers that need discovering, and the conditions under which information is required. These are the basis for findability use cases. Again, begin with the usual situations and save the unusual for later insertion.
  • Start with what you have in the form of finding aids: places, language and content that are already being actively used; examine how they are organized. Solicit and gather experiences about what is good, helpful and “must have” and note interface elements and navigation aids that are not used. Harvest any existing glossaries, dictionaries, taxonomies, organization charts or other definition entities that can provide feeds to terminology lists.
  • Use every discoverable repository as a resource (including email stores, social sites, and presentations) for establishing terminology and eventually writing rules for applying terms. Research repositories that are heavily used by groups of specialists and treat them as crops of terminology to be harvested for language that is meaningful to experts. Seek or develop linguistic parsing and term extraction tools and processes to discover words and phrases that are in common use. Use histograms to determine frequency of use, then alphabetize to find similar terms that are conceptually related, and semantic net tools to group discovered terms according to conceptual relationships. Segregate initialisms, acronyms, and abbreviations for analysis and insertion into final lists, as valid terms or synonyms to valid terms.
  • Talk to the gurus and experts that are the “go-to people” for learning about a topic and use their experience to help determine the most important broad categories for information that needs to be found. Those will become your “top term” groups and facets. Think of top terms as topical in nature (e.g. radar, transportation, weapons systems) and facets as other categories by which people might want to search (e.g. company names, content types, conference titles).
  • Simplify your top terms and facets into the broadest categories for launching your initiative. You can always add more but you won’t really know where to be the most granular until you begin using tags applied to content. Then you will see what topics have the most content and require narrower topical terms to avoid having too much content piling up under a very broad category.
  • Select and authorize one individual to be the ultimate decider. Ambiguity of categorizing principles, purpose and needs is always a given due to variations in cognitive functioning. However, the earlier steps outlined here will have been based on broad agreement. When it comes to the more nuanced areas of terminology and understanding, a subject savvy and organizationally mature person with good communication skills and solid professional respect within the enterprise will be a good authority for making final decisions about language. A trusted professional will also know when a change is needed and will seek guidance when necessary.

Revisit the successes and failures of the applied term store routinely: survey users, review search logs, observe information retrieval bottlenecks and troll for new electronic discourse and content as a source of new terminology. A recent post by taxonomy expert Heather Hedden gives more technical guidance about evaluating and sustaining your taxonomy maintenance.

Launching Your Search for Enterprise Search Fundamentals

It’s the beginning of a new year and you are tasked with responsibility for your enterprise to get top value from the organization’s information and knowledge assets. You are the IT applications specialist assigned to support individual business units with their technology requests. You might encounter situations similar to these:

  • Marketing has a major initiative to re-write all product marketing pieces.
  • Finance is grappling with two newly acquired companies whose financial reports, financial analyses, and forecasts are scattered across a number of repositories.
  • Your Legal department has a need to categorize and analyze several thousand “idea records” that came from the acquired companies in order to be prepared for future work, patenting new products.
  • Research and development is attempting to categorize, and integrate into a single system, R&D reports from an existing repository with those from the acquisitions.
  • Manufacturing requires access to all schematics for eight new products in order to refine and retool manufacturing processes and equipment in their production area.
  • Customer support demands just-in-time retrieval and accuracy to meet their contractual obligations to tier-one customers, often from field operations, or while in transit to customer sites. The latter case often requires retrieval of a single, unique piece of documentation.

All of these groups have needs, which if not met present high risk or even exposure to lawsuits from clients or investors. You have only one specialist on staff who has had two years of experience with a single search engine, but who is currently deployed to field service operations.

Looking at just these few examples we can see that a number of search related technologies plus human activities may be required to meet the needs of these diverse constituents. From finding and assembling all financial materials across a five-year time period for all business units, to recovering scattered and unclassified emails and memos that contain potential product ideas, the initiative may be huge. A sizable quantity of content and business structural complexity may require a large scale effort just to identify all possible repositories to search for. This repository identifying exercise is a problem to be solved before even thinking about the search technologies to adopt for the “finding” activity.

Beginning the development of a categorizing method and terminology to support possible “auto-categorization” might require text mining and text analysis applications to assess the topical nomenclature and entity attributes that would make a good starting point. These tools can be employed before the adoption of enterprise search applications.

Understanding all the “use-cases” for which engineers may seek schematics in their re-design and re-engineering of a manufacturing plant is essential to selecting the best search technology for them and testing it for deployment.

The bottom line is there is a lot more to know about content and supporting its accessibility with search technology than acquiring the search application. Furthermore, the situations that demand search solutions within the enterprise are far different, and their successful application requires far greater understanding of user search expectations than Web searching for a product or general research on a new topic.

To meet the full challenge of providing the technologies and infrastructure that will deliver reliable and high value information and knowledge when and where required, you must become conversant with a boatload of search related topics. So, where do you begin?

A new primer, manageable in length and logical in order has just been published. It contains the basics you will need to understand the enterprise context for search. A substantive list of reading resources, a glossary and vendor URL list round out the book. As the author suggests, and I concur, you should probably begin with Chapter 12, two pages that will ground you quickly in the key elements of your prospective undertaking.

What is the book? Enterprise Search (of course) by Martin White, O’Reilly Media, Inc., Sebastopol, CA. © 2013 Martin White. 192p. ISBN: 978-1-449-33044-6. Also available as an online edition at: http://my.safaribooksonline.com/book/databases/data-warehouses/9781449330439

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Enterprise Search is Never Magic

How is it that the blockbuster deals for acquiring software companies that rank highest in their markets spaces seem to end up smelling bad several months into the deals? The latest acquisition to take on taint was written about in the Wall Street Journal today, noting that HP Reports $8.8 Billion Charge on Accounting Misstatement at Autonomy. Not to dispute the fact that enterprise search megastars Fast (acquired by Microsoft) and Autonomy had some terrific search algorithms and huge presence in the enterprise market, there is a lot more to supporting search than the algorithms.

The fact that surrounding support services have always been essential requirements for making these two products successful in deployment has been well documented over the years. Hundreds of system integrators and partner companies to Microsoft and Autonomy do very well making these systems deliver the value that has never been attainable with just out-of-the-box installations. It takes a team of content, search and vocabulary management specialists to deliver excellent results. For any but the largest corporations, the costs and time to achieve full implementation have rarely been justifiable.

Many fine enterprise search products deliver high value at much more reasonable costs, and with much more efficient packaging, shorter deployment times and lower on-going overhead. Never to be ignored is that enterprise search must be accounted for as infrastructure. Without knowing where the accounting irregularities (also true with Fast) actually lay, I suspect that HP bought the brand and the prospective customer relationships only to discover that the real money was being made by partners and integrators, and the software itself was a loss leader. If Autonomy did not bring with it a solid service and integration operation with strong revenues and work in the pipeline, HP could not have gained what it bargained for in the purchase. I “know” nothing but these are my hunches.

Reflecting back on a couple of articles (If a Vendor Spends Enough… and Enterprise Search and Collaboration…) I wrote a couple of years ago, as Autonomy began hyping its enterprise search prowess in Information Week ads, it seems that marketing is all the magic it needed to reel in the biggest fish of all – a sale to HP.

Right Fitting Enterprise Search: Content Must Fit Like a Glove

This story brought me up short: Future of Data: Encoded in DNA by Robert Lee Hotz in the Wall Street Journal, Aug. 16, 2012. It describes how “…researchers encoded an entire book into the genetic molecules of DNA, the basic building block of life, and then accurately read back the text.” The article then went on to quote Harvard University’s project senior researcher, molecular geneticist, George Church as saying, “A device the size of your thumb could store as much information as the whole Internet. While this concept intrigues and excites me for the innovation and creative thinking, it stimulates another thought, as well. Stop the madness of content overload first – force it to be managed responsibly.

While I have been sidelined from blogging for a couple of months, industry pundits have been contributing their comments, reflections and guidance on three major topics. Big Data tops the list, with analytics a close second, rounded out by contextual relevance as an ever present content findability issue. In November at Gilbane Boston the program features a study conducted by Findwise, Enterprise Search and Findability Survey,2012, which you can now download. It underscores a disconnect between what enterprise searchers want and how search is implemented (or not), within their organizations. As I work to assemble content, remarks and readings for an upcoming graduate course on “Organizing and Accessing Information and Knowledge,” I keep reminding myself what knowledge managers need to know about content to make it accessible.

So, how would experts for our three dominant topics solve the problems illustrated in the Findwise survey report?

For starters, organizations must be more brutal with content housekeeping, or more specifically housecleaning. As we debate whether our country is as great at innovation as in generations past, consider big data as a big barrier. Human beings, even brilliant ones, can only cope with so much information in their waking working hours. I posit that we have lost the concept of primary source content, in other words content that is original, new or innovative. It is nearly impossible to hone in on information that has never been articulated in print or electronically disseminated before, excluding all the stuff we have seen, over and over again. Our concept of terrific search is to be able to traverse and aggregate everything “out there” with no regard for what is truly conceptually new. How much of that “big data” is really new and valuable? I am hoping that other speakers at Gilbane Boston 2012 can suggest methods for crunching through the “big” to focus search on the best, most relevant and singular primary source information.

Second, others have commented, and I second the idea, that analytic tools can contribute significantly to cleansing search domains of unwanted and unnecessary detritus. Search tools that auto-categorize and cross-categorize content, whether the domain is large or small should be employed during any launch of a new search engine to organize content for quick visual examination, showing you where metadata is wrong, mis-characterized, or poorly tagged. Think of a situation where templates are commonly used for enterprise reports and the name of the person who created the template becomes the “author” of every report. Spotting this type of problem and taking steps to remediate and cleanse metadata, before deploying the search system is a fundamental practice that will contribute to better search outcomes. With thoughtful management, this type of exercise will also lead to corrective actions on the content governance side by pointing to how metadata must be handled. Analytics functions that leverage search to support cleaning up data stores are among the most practical tools now packaged with newer search products.

Finally, is the issue of vocabulary management and assigning terminology that is both accurate and relevant for a specific community that needs to find content quickly and without multiple versions, or without content that is just a re-hash of earlier findings published by the originator. Original publication dates, source information and proper author attribution are key elements of metadata that must be in place for any content that is targeted for crawling and indexing. When metadata is complete and accurate, a searcher can expect the best and most relevant content to rise to the top of a results page.

I hope others in a position to do serious research (perhaps a PhD dissertation) will take up my challenge to codify how much of “big data” is really worthy of being found – again, again, and again. In the meantime, use the tools you have in the search and content management technologies to get brutal. Weed the unwanted and unnecessary content so that you can get down to the essence of what is primary, what is good, and what is needed.

Search Engines; They’ve Been Around Longer Than You Think

It dates me, as well as search technology, to acknowledge that an article in Information Week by Ken North containing Medlars and Twitter in the title would be meaningful. Discussing search requires context, especially when trying to convince IT folks that special expertise is required to do search really well in the enterprise, and it is not something acquired in computer science courses.

Evolution of search systems from the print indexes of the early 1900s such as Index Medicus (National Library of Medicine’s index to medical literature) and Chemical Abstracts to the advent of the online Medical Literature Analysis and Retrieval System (Medlars) in the 1960s was slow. However, the phases of search technology evolution since the launch of Medlars has hardly been warp speed. This article is highly recommended because it gives historical context to automated search while defining application and technology changes over the past 50 years. The comparison between Medlars and Twitter, as search platforms is fascinating, something that would never have occurred to me to explore.

A key point of the article is the difference between a system of search designed for archival content with deeply hierarchical categorization for a specialized corpus versus a system of highly transient, terse and topically generalized content. Last month I commented on the need to have search present in your normal work applications and this article underscores an enormous range of purpose for search. Information of a short temporal nature and scholarly research each have a place in the enterprise but it would be a stretch to think of searching for both types via a single search interface. Wanting to know what a colleague is observing or learning at a conference is very different than researching the effects of a uranium exposure on the human anatomy.

What have not changed much in the world of applied search technology are the reasons we need to find information and how it becomes accessible. The type of search done in Twitter or on LinkedIn today is for information that we used to pick up from a colleague (in person or on the phone) or in industry daily or weekly news publications. That’s how we found the name of an expert, learned the latest technologies being rolled out at a conference or got breaking news on a new space material being tested. What has changed is the method of retrieval but not by a lot, and the relative efficiency may not be that great. Today, we depend on a lot of pre-processing of information by our friends and professional colleagues to park information where we can pick it up on the spur of the moment – easy for us but someone still spends the time to put it out there where we can grab it.

On the other end of the spectrum is that rich research content that still needs to be codified and revealed to search engines with appropriate terminology so we can pursue in-depth searching to get precisely relevant and comprehensive results. Technology tools are much better at assisting us with content enhancement to get us the right and complete results, but humans still write the rules of indexing and curate the vocabularies needed for classification.

Fifty years is a long time and we are still trying to improve enterprise search. It only takes more human work to make it work better.

« Older posts Newer posts »