Curated for content, computing, data, information, and digital experience professionals

Category: Enterprise search & search technology (Page 32 of 61)

Research, analysis, and news about enterprise search and search markets, technologies, practices, and strategies, such as semantic search, intranet collaboration and workplace, ecommerce and other applications.

Before we consolidated our blogs, industry veteran Lynda Moulton authored our popular enterprise search blog. This category includes all her posts and other enterprise search news and analysis. Lynda’s loyal readers can find all of Lynda’s posts collected here.

For older, long form reports, papers, and research on these topics see our Resources page.

The Future of Enterprise Search

We’ve been especially focused on enterprise search this year. In addition to Lynda’s blog and our normal conference coverage, we have released two extensive reports, one authored by Lynda and one by Stephen Arnold, and Udi Manber VP Engineering, Search, Google, keynoted our San Francisco conference. We are continuing this focus at our upcoming Boston conference where Prabhakar Raghavan, Head of Yahoo! Research, will provide the opening keynote.

Prabhakar’s talk is titled “The Future of Search”. The reason I added “enterprise” to the title of the post, is that Prabhakar’s talk will be of special interest to enterprises because of its emphasis on complex data in databases and marked-up content repositories. Prabhakar’s background includes stints CTO at Verity and IBM so enterprise (or, if you prefer “behind-the-firewall”, or “intranet”) search requirements are not new to him.

Here is the description from the conference site:

Web content continues to grow, change, diversify, and fragment. Meanwhile, users are performing increasingly sophisticated and open-ended tasks online, connecting broadly to content and services across the Web. The simple search result page of blue text links needs to evolve to address these complex tasks, and this evolution includes a more formal understanding of user’s intent, and a deeper model of how particular pieces of Web content can help. Structured databases power a significant fraction of Web pages, and microformats and other forms of markup have been proposed as mechanisms to expose this structure. But uptake of these mechanisms remains limited, as content owners await the killer application for this technology. That application is search. If search engines can make deep use of structured information about content, provided through open standards, then search engines and site owners can together bring consumers a far richer experience. We are entering a period of massive change to enable search engines to handle more complex content. Prabhakar Raghavan, head of Yahoo! Research, will address the future of search: how search engines are becoming more sophisticated, what the breakthrough point will be for semantics on the Web and what this means for developers and publishers.

Join us on December 3rd at 8:30am at the Boston Westin Copley. Register.

Dewey Decimal Classification, Categorization, and NLP

I am surprised how often various content organizing mechanisms on the Web are compared to the Dewey Decimal System. As a former librarian, I am disheartened to be reminded how often students were lectured on the Dewey Decimal system, apparently to the exclusion of learning about subject categorization schemes. They complemented each other but that seems to be a secret among all but librarians.

I’ll try to share a clearer view of the model and explain why new systems of organizing content in enterprise search are quite different than the decimal model.

Classification is a good generic term for defining physical organizing systems. Unique animals and plants are distinguished by a single classification in the biological naming system. So too are books in a library. There are two principal classification systems for arranging books on the shelf in Western libraries: Dewey Decimal and Library of Congress (LC). They each use coding (numeric for Dewey decimal and alpha-numeric for Library of Congress) to establish where a book belongs logically on a shelf, relative to other books in the collection, according to the book’s most prominent content topic. A book on nutrition for better health might be given a classification number for some aspect of nutrition or one for a health topic, but a human being has to make a judgment which topic the book is most “about” because the book can only live in one section of the collection. It is probably worth mentioning that the Dewey and LC systems are both hierarchical but with different priorities. (e.g. Dewey puts broad topics like Religion and Philosophy and Psychology at top levels and LC puts those two topics together while including more scientific and technical topics at the top of the list, like Agriculture and Military Science.)

So why classify books to reside in topic order? It requires a lot of labor to move the collections around to make space for new books. It is for the benefit of the users, to enable “browsing” through the collection, although it may be hard to accept that the term browsing was a staple of library science decades before the internet. Library leaders established eons ago the need for a system of physical organization to help readers peruse the book collection by topic, leading from the general to the specific.

You might ask what kind of help that was for finding the book on nutrition that was classified under “health science.” This is where another system, largely hidden from the public or often made annoyingly inaccessible, comes in. It is a system of categorization in which any content, book or otherwise, can be assigned an unlimited number of categories. Wondering through the stacks, one would never suspect this secret way of finding a nugget in a book about your favorite hobby if that book was classified to live elsewhere. The standard lists of terms for further describing books by multiple headings are called “subject headings” and you had to use a library catalog to find them. Unfortunately, they contain mysterious conventions called “sub-divisions,” designed to pre-coordinate any topic with other generic topics (e.g. Handbooks, etc. and United States). Today we would call these generic subdivision terms, facets. One reflects a kind of book and the other reveals a geographical scope covered by the book.

With the marvel of the Web page, hyperlinking, and “clicking through” hierarchical lists of topics we can click a mouse to narrow a search for handbooks on nutrition in the United States for better health beginning at any facet or topic and still come up with the book that meets all four criteria. We no longer have to be constrained by the Dewey model of browsing the physical location of our favorite topics, probably missing a lot of good stuff. But then we never did. The subject card catalog gave us a tool for finding more than we would by classification code alone. But even that was a lot more tedious than navigating easily through a hierarchy of subject headings, narrowing the results by facets on a browser tab and further narrowing the results by yet another topical term until we find just the right piece of content.

Taking the next leap we have natural language processing (NLP) that will answer the question, “Where do I find handbooks on nutrition in the United States for better health?” And that is the Holy Grail for search technology – and a long way from Mr. Dewey’s idea for browsing the collection.

MicroLink Launches MicroLink Autonomy Integration Suite for SharePoint

MicroLink announced the release of MicroLink Autonomy Integration Suite (AIS) for SharePoint 2003/2007, which consists of six web parts that integrate Autonomy’s Data Operating Layer (IDOL) server with Microsoft Office SharePoint Server (MOSS). This integration allows SharePoint users to leverage Autonomy’s information discovery capability and automated features in a unified platform. MicroLink’s Autonomy Integration Suite for SharePoint consists of custom web parts that create more efficient access to the search capabilities of Autonomy’s IDOL server from within SharePoint. With interfaces familiar to SharePoint users, AIS helps organizations to process digital content automatically, share data and synchronize with other data webparts. AIS comprises Search and Retrieval, Agents, Profiling, Web Channels, Clustering, and Community Collaboration. AIS also improves expertise search and incorporates full document level security. Key Features of AIS: Federated search capabilities for SharePoint, enabling customers to index and search all content across the entire enterprise and repositories inside and outside the SharePoint environment; Custom Web Parts that enable access to the capabilities of Autonomy’s IDOL platform from within Microsoft’s SharePoint Portal Server; Data connections for each web part that allows data sharing and synchronization between parts; For the end user, a singular interface that is consistent with the SharePoint user experience. http://www.MicroLinkllc.com

Controlling Your Enterprise Search Application

When interviewing search administrators who had also been part of product selection earlier this year, I asked about surprises they had encountered. Some involved the selection process but most related to on-going maintenance and support. None commented on actual failures to retrieve content appropriately. That is a good thing whether it was because, during due diligence they had already tested for that during a proof of concept or because they were lucky.

Thinking about how product selections are made, prompts me to comment on a two major search product attributes that control the success or failure of search for an enterprise. One is the actual algorithms that control content indexing, what is indexed and how it is retrieved from the index (or indices). The second is the interfaces, interfaces for the population of searchers to execute selections, and interfaces for results presentation. On each aspect, buyers need to know what they can control and how best to execute it for success.

Indexing and retrieval technology is embedded with search products; the number of administrative options to alter search scalability, indexing and content selection during retrieval is limited to none. The “secret sauce” for each product is largely hidden, although it may have patented aspects available for researching. Until an administrator of a system gets deeply into tuning, and experimenting with significant corpuses of content, it is difficult to assess the net effect of delivered tuning options. The time to make informed evaluations about how well a given product will retrieve your content when searched by your select audience is before a purchase is made. You can’t control the underlying technology but you can perform a proof of concept (PoC). This requires:

  • human resources and a commitment of computing resources
  • well-defined amount, type and nature (metadata plus full-text or full-text unstructured-only) to give a testable sample
  • testers who are representative of all potential searchers
  • a comparison of the results with three to four systems to reveal how well they each retrieve the intended content targets
  • knowledge of the content by testers and similarity of searches to what will be routinely sought by enterprise employees or customers
  • search logs of previously deployed search systems, if they exist. Searches that routinely failed in the past should be used to test newer systems

Interface technology
Unlike the embedded search technology, buyers can exercise design control or hire a third-party to produce search interfaces that vary enormously. Controlling for what searchers experience when they first encounter a search engine, either a search box at a portal or a completely novel variety of search options with search box, navigation options or special search forms is within the control of the enterprise. This may be required if what comes “out-of-the box” as the default is not satisfactory. You may find, at a reasonable price, a terrific search engine that scales well, indexes metadata and full-text competently and retrieves what the audience expects but requires a different look-and-feel for your users. Through an API (application programming interface), SDK (software development kit) or application connectors (e.g. Documentum, SharePoint) numerous customization options are delivered with enterprise search packages or are available as add-ons.

In either case, human resource costs must be added to the bottom line. A large number of mature software companies and start-ups are innovating with both their indexing techniques and interface design technologies. They are benefiting from several decades of search evolution for search experts, and now a decade of search experiences in the general population. Search product evolution is accelerating as knowledge of searcher experiences is leveraged by developers. You may not be able to control emerging and potentially disruptive technologies, but you can still exercise beneficial controls when selecting and implementing most any search system.

Enterprise Search: Case Studies and User Communities

While you may be wrapping up your summer vacation or preparing for a ramp up to a busy fourth quarter of business, the Gilbane team is securing the speakers for an exciting conference Dec. 2 – 4 in Boston. Evaluations of past sessions always give high marks to case studies delivered by users. We have several for the search track but would like a few more. If one of your targets for search is documents stored in SharePoint repositories, your experiences are sure to draw interest.

SharePoint is the most popular new collaboration tool for organizations with a large Microsoft application footprint but it usually resides with multiple other repositories that also need to be searched. So, what search products are being used to retrieve SharePoint content plus other content? A majority of search applications provide a connector to index SharePoint documents and they would not be making that available without a demand. We would like to hear what SharePoint adopters are actively using for search. What are you experiencing? If you would like to participate in the Gilbane Conference, and have experiences you to share, I hope you will get in touch and check out the full program.

On a related note, I was surprised, during my recent research, to discover few identifiable user-groups or support communities for search products. Many young companies launch and sponsor “user-group meetings” to share product information, offer training, and facilitate peer-to-peer networking among their customers. It is a sign of confidence when they do help customers communicate with each other. It signals a willingness to open communication paths the might lead to collective product critiques which, if well organized, can benefit users and vendors. It is also a sign of maturity when companies reach out to encourage customers to connect with each other. May-be some are operating in stealth mode but more should be accessible to interested parties in the marketplace.

Organizing functions are difficult to manage by users on their own professional time, so, having a vendor willing to be the facilitator and host for communication mechanisms is valuable. However, they sometimes need to have customers giving them a nudge to open the prospect of such a group. If you would value participating in a network of others using your selected product, I suggest taking the initiative by approaching your customer account representative.

Communities for sharing tips about any technology are important but so is mutual guidance to help others become more successful with any product’s process management and governance issues. User groups can give valuable feedback to their vendors and spur product usage creativity and efficiency. Finally, as an analyst I would much rather hear straight talk about product experiences from those who are active users, than a filtered version from a company representative. So, please, reach out to your peers and share your story at any opportunity you can. Volunteer to speak at conferences and participate in user groups. The benefits are numerous, the most important being the formation of a strong collective voice.

Welcome Fred Dalrymple

Fred is our newest contributor, and has already posted his first blog entry. Fred pokes at the challenging tension in search between intent and context, especially over time as context (or intent) changes. Lynda has also posted about intent, and the subject also came up in discussions of search quality around Udi Manber’s talk at our conference in this past June.

Fred brings the welcome perspective of a serious software developer, and will be blogging on a few different topics, so he may be posting here on on one of our other blogs. Welcome Fred!

Researching Search with Intent Firmly in Control

I have hit on intent before and our latest member of the Gilbane blog team, Fred Dalrymple has joined the theme with his entry this week. Welcome Fred! You have given me an opening for an already planned topic, how to conduct research for enterprise search tools, those that go beyond the search box. Actually, this guidance is appropriate for the selection of any technology applications.

Getting intent solidly defined is important for so many reasons, many of them relating to solving a business problem and the expected outcomes. Knowing what these are will give you the framework for isolating likely candidates, efficiently. A second critical reason for having strong intent is to stave off project scope creep. As a former vendor, and now consultant, I see this play out repeatedly as product research ensues. Weak backbones in selection team members or flimsiness of their business case leaves openings for vendors to promote additional features, which often distracts from what is really needed.

So, armed with the right skeleton, a strong framework, a core scaffolding you are ready to approach your research systematically. Four paths are open to a study team; I recommend using all of them, in overlapping passes. Discovery about products, product performance in real-world scenarios, vendor business relationships with their clients, and the user community you will be joining are all targets that need to be exploited.

Discovering a user community on-line that might have expressed a potential problem with a vendor or product, should drive you back to do more research to discover potential limitations or why a user might be having a problem that they brought on through inappropriate implementation. Iteration in research for technology requires perseverance and patience. A comment on each path to research might be helpful:

  • Online research – This requires creativity and the most persistence to verify and validate what you find. I am amazed at how superficially many people read any content. We may be taught that good business writing requires a clear statement in the first paragraph of what follows with a solid summary at the end, but most content does not follow “good” business writing practices. You need to read between the lines, think about what is not being said and ask yourself why, follow every link on the sites of vendors under serious consideration. Look at vendor news notes and press releases to see how much activity is going on with product advances or new installations, and read descriptions of customer implementations to see how closely those deployments match your business need. Finally, search those customer names on the Internet in conjunction with the product name. This may retrieve public content that sheds more light on user experiences.
  • Professional groups – Professional organizations in which you participate are fertile ground for asking about what others in similar situations to yours are using. As you get closer to a final choice, go back to others you know personally or professionally to get answers to the direct question, “have you had any problems with this product or vendor?” and “what is the benefit of this product for you?”
  • Societies and academic institutions – These organizations publish content that may have a cost associated. When you consider thousands your organization spends on a selection process (in people time), contracting, licensing, implementation and deployment, it is wise to have a budget of several hundred dollars for reports that give detailed product evaluations. Get recommendations of librarians and peers as to publications’ authoritativeness.
  • User and analyst blogs and industry publications – The same guidance holds for industry publications as for societies and academic publishers but you will also want to pursue blogs of users and analysts. Users are a great source of discovering tidbits about products and vendors but continue beyond what you discover to see if the comments are isolated or follow a pattern.

This is a longer commentary than I intended but the core of my intent needed flesh, so there it is.

Beyond Intent

Intent, hidden within a search click, lies at the intersection of Search and Business, as in “let’s do some business”. That search click has extra-ordinary value because of the intent to buy — that’s why we’re searching, right?

Perhaps, or maybe we’re just browsing, or surfing, and we’re not in the mood for advertisements. It could be more militant than that; perhaps we’re still trying to research our choices and would see a sales pitch as tainting the honesty of the information. At least that’s what the founders of Google originally believed.

Although the model of the web was a set of stateless pages, and a Google search box certainly fits that appearance, people’s intent is not stateless. It ebbs and flows, from entertaining looking around, to researching choices and comparing possibilities, through sourcing a chosen product (now we’re talking about a qualified buyer), to selecting fulfillment options, and possibly all the way to figuring out how to return a product that we’re dissatisfied with. That last one is probably not the best time to present an ad claiming how wonderful that product is.

This is a “long running transaction,” a series of steps that fit together and flow towards (and past) a purchasing decision, but with back-currents and eddies. And it really is a transaction in the database sense where a failure during one step can cause the entire sequence to be discarded as if it never happened. Though if you believe Sergey and Larry, it will be worse than never happening, you may lose trust in your guide through that transaction.

Has the intent changed? Depends on what that means. On one hand, what has changed across those steps is the mode of the intent. If the intent was to purchase a product, then the research, comparison, purchase, and fulfillment were clearly pieces of that intent, though they call for different approaches: organic search for the research, product focused responses for the purchase, perhaps service-oriented for the fulfillment, and some combination for the comparison.

But what about that “I need to return this product because I hate it” step? The intent has clearly changed, but it is more necessary than ever to connect this new intent to the previous steps. If not, perhaps the search engine will continue to suggest that product to a disgruntled customer with very counter-productive results.

So, what is the unifying concept? Is it intent, organized by modes? Not if what is being unified is a complete user’s story about their purchasing experience.

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑