Category: Enterprise search & search technology (Page 40 of 62)

Research, analysis, and news about enterprise search and search markets, technologies, practices, and strategies, such as semantic search, intranet collaboration and workplace, ecommerce and other applications.

Before we consolidated our blogs, industry veteran Lynda Moulton authored our popular enterprise search blog. This category includes all her posts and other enterprise search news and analysis. Lynda’s loyal readers can find all of Lynda’s posts collected here.

For older, long form reports, papers, and research on these topics see our Resources page.

Sharepoint and Search

August 16, 2007 / Lynda Moulton / 0 Comments

Sharepoint repositories are a prime content target for most search engines in the enterprise search arena, judging from the number of announcements I’ve previewed from search vendors in the last six month. This list is long and growing (Names link to press releases or product pages for Sharepoint search enabling):

Autonomy
Coveo
dtSearch
FAST
ISYS
Longitude from BA-Insight
Ontolica from Mondosoft
OpenText
Oracle
Recommind
Schemalogic
Vivisimo
X1
… and surely more I’ve missed

Almost a year ago I began using a pre-MOSS version of Sharepoint to collect documents for a team activity. Ironically, the project was the selection, acquisition, implementation of a (non-Sharepoint) content management system to manage a corporate intranet, extranet, and hosted public Web site. The version of Sharepoint that was “set up” for me was strictly out of the box. Not being a development, I was still able to muddle my way through setting up the site, established users, posting announcements and categories of content to which I uploaded about fifty or sixty documents.

The most annoying discovery was the lack of a default search option. Later updating to MOSS solved the problem but at the time it was a huge aggravation. Because I could not guarantee a search option would appear soon enough, I had to painstakingly create titles with dates in order to give team members a contextual description as they would browse the site. Some of the documents I wanted to share were published papers and reviews of products. Dates were not too relevant for those, so I “enhanced” the titles with my own notations to help the finders select what they needed.

These silly “homemade” solutions are not uncommon when a tool does not anticipate how we would want to be able to use it. They persist as ways to handle our information storage and retrieval challenges. Since the beginning of time humans have devised ways to store things that they might want to re-use at some point in the future. Organizing for findability is an art as much at it is science. Information science only takes one so far in establishing the organizing criteria and assigning those criteria to content. Search engines that rely strictly on the author’s language will leave a lot of relevant content on the shelf for the same reasons as using Dewey Decimal classification without the complementary card catalog of subject topics. The better search engines exploit every structured piece of data or tagged content associated with a document, and that includes all the surrounding metadata assigned by “categorizers.” Categorizers might be artful human indexers or automated processes. Search engines with highly refined, intelligent categorizers to enable semantically rich finding experiences bring even more sophistication to the search experience.

But back to Sharepoint, which does have an embedded search option now, I’ve heard more than one expert comment on the likelihood that it will not be the “search” of choice for Sharepoint. That is why we have so many search options scrambling to promote their own Sharepoint search. This is probably because the organizing framework around contributing content to Sharepoint is so loosey goosey that an aggregation of many Sharepoint sites across the organization will be just what we’ve experienced with all these other homegrown systems – a dump full of idiosyncratic organizing tricks.

What you want to do, thoughtfully, is assess whether the search engine you need will share only Sharepoint repositories OR both structured and unstructured repositories across a much larger domain of types of content and applications. It will be interesting to evaluate the options that are out there for searching Sharepoint gold mines. Key questions: Is a product targeting only Sharepoint sites or diverse content? How will content across many types of repositories be aggregated and reflected organized results displays? How will the security models of the various repositories interact with the search engine? Answering these three questions first will quickly narrow your list of candidates for enterprise search.

The Marginal Influence of E-commerce Search and Taxonomies on Enterprise Search Technologies

July 31, 2007 / Lynda Moulton / 0 Comments

As we gear up for Gilbane Boston 2007, the number of possible topics to include in the tracks related to search seems boundless. The search business is in a transitional state but in spite of disarray is still pivotal in its impact on business and current culture. The sessions will reflect the diversity in the market.

One trend is quite clear; the amount of money and effort being expended for Web search or site search on commercial Web sites is a winner in the “search technology” revenues war with annual revenues measuring well into the $billions. On the other hand, a recent Gartner study described the 2006 revenues for enterprise search as below $400M. This figure comes from reading an excellent article, Enterprise Search: Seek and Maybe You’ll Find, by Ben DuPont in Intelligent Enterprise. Check it out.

The distinctions between search on the Web and search within the enterprise are numerous but here are two. First, Internet Web search revenue is all about marketing. Yes, we use it to discover, learn, find facts, and become more informed. But when companies supplying search technology to expose you to their content on the Internet they do so to facilitate commerce. If it falls into the hands of organizations that have other intent, libraries or government agencies, so be it.

As we all know, when we are at work, seeking to discover, learn or find facts to do our jobs better, we need a different kind of search. Thus, we seek a clear search winner built just for our enterprise with all of its idiosyncrasies. The problem is that what is inside does not look like the rest of the world’s content as it is aggregated for commercial views. Enterprises are unique and operate sometimes chaotically, or, at best, with nuanced views of what information is most important.

The second distinction relates to taxonomies, and the increase in their development and use. I’ve seen a dramatic increase in job postings for “taxonomists” and have managed several projects for enterprises over the years to build these controlled lists of terms for categorizing content. What is noteworthy about recent job opportunities is that most seem to be for customer facing Web sites. Historically, organizations with substantial internal content (e.g. research reports, patents, laboratory findings, business documents) hired professionals to categorize materials for a narrowly defined audience of specialists. The terminology was often highly unique, could number in the hundreds or thousands of terms, even for a relatively small enterprise. This is no longer a common practice.

Slow financial growth in enterprise search markets is no surprise. Like many tools designed and marketed for departments not directly tied to revenue generation, search goes begging for solid vertical markets. Search’s companion technologies are also struggling to find a lucrative toehold for use within the organization. Content management systems integrated with rich and efficient taxonomy building and maintenance functions are hard to find.

I am confident that tools in CMS products for building and maintaining complex taxonomies will not improve until enterprises find a solid business reason to put professional human resources into doing content management, taxonomy development, search, and text analytics on their most important knowledge assets. This is a tough business proposition compared to the revenues being driven on the Internet. What businesses need to keep in mind is that without the ability to leverage their internal knowledge content assets better, smarter and faster, there won’t be innovative products in the pipeline to generate commerce. Losing track of your valuable intellectual resources is not a good long term strategy. Once you begin committing to solid content resource management strategies, enterprise technology products will improve to meet your needs.

Search and Need

July 20, 2007 / Lynda Moulton / 0 Comments

Since an attempt to parse, in the simplest terms, the “enterprise search” market in January, I have been exposed to no less than 77 products and vendors whose offerings have been brought to my attention. Add to that another 20 or 30 peripheral offerings in the text mining and text analytics sphere and you’ll understand why the need for a focused view when considering products.

Selling and marketing at its best sells to a need. Need expresses something about users, user behaviors, user requirements, and problems to be solved. Need also implies emotions and that may present a problem when it comes to making business decisions.

Nothing plays into emotional business decisions like money, as illustrated by one IT manager’s reaction to this week’s Yahoo News story about Google offering its search appliance for small Web sites for $100 for up to 5,000 pages. Noting that $500/year would support up to 50,000 Web pages, he thought it could be a solution for the company’s intranet. In a tough budget situation it seemed to make sense because the maintenance fee for current search software far exceeds $500.

Let’s be clear, Google is offering site search for a Web site on the World-wide Web, not internal enterprise sites. There is a huge difference in the number of variables to be considered not the least of which are:

Who is authoring and maintaining the target content, and what do they expect to have the search engine do with the tags and content?
Who are the users, what are they looking for, and how do they expect it to be displayed?
What is the software providing in the way of managing and supporting metadata
Where is the software going to run and be maintained?
What are the security and authorization considerations?
What about all the internal content that is not “Web pages” (e.g. PDFs, spreadsheets, slide shows, images) with their associated metadata that may not be supported in this license but are fundamental to an enterprise search solution
What do page ranking and ad management have to do with internal search requirements?

Just to be clear, there are other solutions that may come with levels of Web site search support that are more suited to many small organizations, internal and external. This week I learned more about one such offering, PicoSearch that has options from free to very reasonable monthly charges bundled with service for hosting search for an organization’s content. It can also provide some levels of password protection and security controls. This may not be an optimal choice for organizations with complex and multi-faceted search interfaces but could be perfect for associations, educational institutions, and small businesses with straightforward product lines.

Keep in mind, inexpensive does not mean “cheap” and it is also not the first qualifying criteria for what is “appropriate.”

Random Notes from the World on Search

July 13, 2007 / Lynda Moulton / 0 Comments

A week late I am wrapping up my first six months blogging for The Gilbane Group on enterprise search. I am attempting a retrospective of discoveries, thoughts and issues that surfaced in second quarter. June was especially busy and now that I have had time to sort the sortable here are a few noteworthy highlights and reflections on them. In short, the search market is complex and becoming more so on a monthly basis.
Google the company and Google the product suite are so dominant that any article about search in the mainstream or technical presses evokes the “G-word.” This happens even if Google is not the main topic.

Consider for example Walter Mossberg in the Wall Street Journal in this article June 28, “Ask.com Takes Lead in Designing Display of Search Results.” The first paragraph never mentioned Ask.com but began “Google and other search companies …” On the same page was an article “Start-ups Make Inroads with Google’s Work Force.” Earlier that week the New York Times ran a story “The Human Touch that May Loosen Google’s Grip,” MassHighTech referred to Google throughout an article “Why the Best Search Marketers are Right-Brained,” and Intelligent Enterprise did as well in “Enterprise Search: Seek and Maybe You’ll Find.” [More about the latter further on.] A search on www.Clusty.com for “Google” under News>Top News today gets 89 hits and “Toyota” 49.

Korea presents us with a take on Internet search that I think is highly relevant to the enterprise search market as described in the New York Times, July 5 in “South Koreans Connect Through Search Engine.” It turns out that the amount of content in Korean on the WWW is so scanty that Google is irrelevant. Instead a five year-old company, Naver.com is giving Koreans what they really need, answers to questions native Koreans are seeking, built up collaboratively through their cultural “helpfulness.” Naver.com services 77% of all Internet “searches originating in South Korea.” Just as Google can’t deliver to a Korean population what it wants to know, Google can’t really “understand” all of the information needs nuances in culturally diverse enterprises. Naver maintains “questions and answers in proprietary databases not shared with other portals or search engines” as well an enterprise might want to do.

At the Red Herring conference in Boston on June 28, a panel of industry leaders, in a session entitled “The New Frontier in Search” was asked by the moderator whether there will “be any major breakthroughs in semantic search in the next ten years.” The answer from all four including Jeff Cutler of Answers.com and Doug Leeds of Ask.com was an unequivocal , “NO!” I have a list of over 30 companies working on or publicly “sniffing around” the semantic search marketplace. Others are sure to be engaged in stealth work so “not in ten years” is hard to digest but who really knows?

Also at Red Herring, in an interview with EMC’s Mark Lewis, he emphasized a compelling issue for enterprise search, “security,” namely authentication for permission to view search results. In another panel session moderated by Judy Hurwitz on SOA, the security issue was even more dominant as speakers discussed the complexities of integrating heterogeneous applications in a SOA environment while maintaining security integrity. As the number of variables in the architecture rises, so too the technical difficulties of making secure content really secure in search.

The Enterprise Search landscape is pretty crowded with companies that are more focused on helping us find what is in the organization than what is on an enterprise’s Web site. Summarizing the challenges these vendors face is the aforementioned article, “Enterprise Search: Seek and Maybe You’ll Find.” Their market is my beat but grappling with the realities of serving such diverse audiences is a serious necessity.

OK, this blog entry is already too long but you get the idea. The fact that the New York Times has recently had at least one article a week relating to search technologies is really a business marker. While search was introduced to professional searchers 35 years ago, it has been a real sleeper for most of the decades since. Web technology is truly the enabler of so much that makes search work for the masses in so many environments. It’s pretty clear that although search is ubiquitous in the workplace, its commodity status and the normalizing of enterprise search protocols are still a few years off. It is going to be interesting to see who stumbles and who prevails of the current bumper crop of offerings. Or will another disruption take us into more innovative forms of search?

Stay tuned for the next six months – I’m predicting more shakeout in the industry and more adoption of different flavors of search in more organizations. Trying to keep up will be the primary challenge.

Respect for Complexity and Security are Winners

June 22, 2007 / Lynda Moulton / 0 Comments

I participated in one search vendor’s user conference this week, and a webinar sponsored by another. Both impressed me because they expressed values that I respect in the content software industry and they provided solid evidence that they have the technology and business delivery infrastructure to back up the rhetoric.

You have probably noted that my blog is slim on comments about specific products and this trend will continue to be the norm. However, in addition to the general feeling of good will from Endeca customers that I experienced at Endeca Discover 2007, I heard clear messages from sessions I attended that reinforced the company’s focus on helping clients solve complex content retrieval problems. Complexity is inherent in enterprises because of diversity among employees, methods of operating, technologies deployed and varied approaches to meeting business demands at every level.

In presentations by Paul Sonderegger and Jason Purcell care was given to explain Endeca’s approach to building their search technology solutions and why. At the core is a fundamental truth about how organizations and people function; you never know how a huge amount of unpredictably interconnected stuff will be approached. Endeca wants its users to be able to use information as levers to discover, through personalized methods, relationships among content pieces that will pry open new possibilities for understanding the content.

Years ago I was actively involved with a database model called an associative structural model. It was developed explicitly to store and manipulate huge amount of database text and embodied features of hierarchical, networked and relational data structures.

It worked well for complex, integrated databases because it allowed users to manipulate and mingle data groups in unlimited ways. Unfortunately, it required a user to be able to visualize the possibilities for combining and expressing data from hundreds of fields in numerous tables by using keys. This structural complexity could not easily be taught or learned, and tools for simple visualization were not available in the early 1980s. As I listened to Jason Purcell describe Endeca’s optimized record store, and concept of “intra-query” to provide solutions for the problems posed by double uncertainty I thought, “They get it.” They have acknowledged the challenge of making it simple to update, use and exploit vast knowledge stores; they are working hard to meet the challenge. Good for them! We all want flexibility to work the way we want but if it is not easy we will not adopt.

In a KMWorld webinar, Vivisimo’s Jerome Pesente and customer Arnold Verstraten of NV Organon co-presented with Matt Brown of Forrester Research. The theme was search security models. Besides the reasons for up-front consideration for security when accessing and retrieving from enterprise repositories, three basic control models were described. All three were based on access control lists (ACLs), how and why they are used by Vivisimo.

Having worked with defense agencies, defense contractors and corporations with very serious security requirements on who can access what, I am very familiar with the types of data structures and indexing methods that can be used. I was pleased to hear the speakers address trade-offs that include performance and deployment issues. It served to remind me that organizations do need to be thinking about this early in the selection process; inability to handle the most sensitive content appropriately should eliminate any enterprise search vendor that tries to equivocate on security. Also, as Organon did, there is nothing that demonstrates the quality of the solution like a “bake-off” against a sufficient corpus of content that will demonstrate whether all documents and their metadata that must not be viewed by some audiences in fact are always excluded from search results for all in those restricted audiences. Test it. Really test it!

Turbo Search Engines in Cars; it is not the whole solution

June 15, 2007 / Lynda Moulton / 0 Comments

In my quest to analyze the search tools that are available to the enterprise, I spend a lot of time searching. These searches use conventional on-line search tools, and my own database of citations that link to articles, long forgotten. But true insights about products and markets usually come through the old-fashioned route, the serendipity of routine life. For me search also includes the ordinary things I do everyday:

Looking up a fact (e.g. phone number, someone’s birthday, woodchuck deterrents), which I may find in an electronic file or hardcopy
Retrieving a specific document (e.g. an expense form, policy statement, or ISO standard), which may be on-line or in my file cabinet
Finding evidence (e.g. examining search logs to understand how people are using a search engine, looking for a woodchuck hole near my garden, examining my tires for uneven tread wear), which requires viewing electronic files or my physical environment
Discovering who the experts are on a topic or what expertise my associates have (e.g. looking up topics to see who has written or spoken, reading resumes or biographies to uncover experience), which is more often done on-line but may be buried in a 20-year old professional directory on the shelf
Learning about a subject I want or need to understand (e.g. How are search and text analytics being used together in business enterprises? what is the meaning of the tag line “Turbo Search Engine” on an Acura ad?), which were partially answered with online search but also by attending conferences like the Text Analytics Summit 2007 this week

This list illustrates several things. First search is about finding facts, evidence, aggregated information (documents). It is also about discovering, learning and uncovering information that we can then analyze for any number of decisions or potential actions.

Second, search enables us to function more efficiently in all of our worldly activities, execute our jobs, increase our own expertise and generally feed our brains.

Third, search does not require the use of electronic technology, nor sophisticated tools, just our amazing senses: sight, hearing, touch, smell and taste.

Fourth, that what Google now defines as “cloud computing” and MIT geeks began touting as “wearable” technology a few years ago have converged to bring us cars embedded with what Acura defines as “turbo search engines.” On this fourth point, I needed to discover the point. In small print on the full page ad in Newsweek were phrases like “linked to over 7,000,000 destinations” and “knows where traffic is.” In even tinier print was the statement, “real-time traffic monitoring available in select markets…” I thought I understood that they were promoting the pervasiveness of search potential through the car’s extensive technological features. Then I searched the Internet for the phrase “turbo search engine” coupled with “Acura” only to learn that there was more to it. Notably, there is the “…image-tagging campaign that enables the targeted audience to use their fully-integrated mobile devices to be part of the promotion.” You can read the context yourself.

Well, I am still trying to get my head around this fourth point to understand how important it is to helping companies find solid, practical search solutions to problems they face in business enterprises. I don’t believe that a parking lot full of Acura’s is something I will recommend.

Fifth, I experienced some additional thoughts about the place for search technology this week. Technology experts like Sue Feldman of IDC and Fern Halper of Hurwitz & Associates appeared on a panel at the Text Analytics Summit. While making clear the distinctions between search and text analytics, and text analytics and text mining, Sue also made clear that algorithmic techniques employed by the various tools being demonstrated are distinct for each solving different problems in different business situations. She and others acknowledge that finally, having embraced search, enterprises are now adopting significant applications using text analytic techniques to make better sense of all the found content.

Integration was a recurring theme at the conference, even as it was also obvious that no one product embodies the full range of text search, mining and analytics that any one enterprise might need. When tools and technologies are procured in silos, good integration is a tough proposition, and a costly one. Tacking on one product after another and trying to retrofit to provide a seamless continuum from capturing, storing, and organizing content to retrieving and analyzing the text in it, takes forethought and intelligent human design. Even if you can’t procure the whole solution to all your problems at once, and who can, you do need a vision of where you are going to end up so that each deployment is a building block to the whole architecture.

There is a lot to discover at conferences that can’t be learned through search, like what you absorb in a random mix of presentations, discussions and demos that can lead to new insights or just a confirmation of the optimal path to a cohesive plan.

A Story About Structured Content and Search

June 7, 2007 / Lynda Moulton / 0 Comments

Today was spent trying to sift through four distinct piles of paper, a backlog of email messages, and managing my calendar. My goal was first to get rid of documents and articles that are too old or irrelevant to “deal with.” The remainder I intended to catalog in my database, file in the appropriate electronic folder, or place in a new pile as a “to-do list.” This final pile plus my email “In Box” would then be systematically assigned to a spot in my calendar for the next six weeks. I did have deadlines to meet, but they depended on other people sending me content, which never came. So I kept sifting and organizing. As you can guess, the day that began with lofty intentions of getting to the bottom of the piles so that I could prioritize my real work is ending, instead, with this blog entry. It is not the one I began for this week four days ago.

First, the most ironic moment of the day came from the last pile in which I turned over an article that must have made an impression in 1997, from Newsweek it was entitled Drowning in Data. I knew I shouldn’t digress, again, but reading it confirmed what I already knew. We all have been “drowning in data” since at least 1997 and for all the same reasons we were back then, Internet publishing, email, voice mail, and faxes (well not so much anymore). It has the same effect as it did ten years ago on “info-stressed” professionals; it makes us all want to go slower so we can think about what is being thrown at us. Yes, that is why I was isolated trying to bring order to the info-glut on my desks. The article mentioned that “the average worker in a large corporation sent and received an astounding 177 messages a day…”

That is the perfect segue to my next observation. In the course of the day, while looking for emails needed to meet deadlines, I emptied over 300 messages from my Junk Mailbox, over 400 from my Deleted Mailbox, and that left me with just 76 in my In Mailbox, which I will begin acting on when I finish this blog entry. (Well, may-be after dinner.) What happened today that caused six different search vendors to send invitations to Webinars or analyst briefings? Oh well, when I finally get around to filling out my calendar for the next six weeks I will probably find out that some, if not all, conflict with appointments I already have. So, may-be I should finish the calendar before responding to the emails.

In the opening of this story I mentioned four distinct piles; I lied. As one document was replaced by another, I discovered that there was no unifying theme for any one pile. So much for categorization, but I did find some important papers that required immediate action, which I took.

Finally, I uncovered an article from http://techweb.cmp.com/iw in 1996. The Information Week archives don’t go back that far but the title was Library on an Intranet. It described a Web-based system for organizing corporate information by Sequent Computer Systems. I know why I saved it; because I had developed and was marketing corporate library systems to run over company networks back in 1980. I did find a reference to the Sequent structured system for organizing and navigating corporate content. You will find it at: http://www.infoloom.com/gcaconfs/WEB/seattle96/lmd.HTM#N136. It is a very interesting read.

What a ride we have had trying to corral this info-glut electronically for over 30 years. From citation searching using command languages in the 1970s, to navigation and structured searching in library systems in the 1980s and 90s, to Web-based navigation coupled with full-text searching in the mid-90s; it never ends. And I am still trying to structure my paper piles into a searchable collection of content.

May-be browsing the piles isn’t such a bad idea after-all. I never would have found those articles using “search” search.

Postscript: This really happened. When I finished this blog entry and went to place the “Drowning…” article on a pile I never got to, there on the top was an article from Information Week, April 9, 2007, entitled “Too Much Information.” I really didn’t need to read the lecturing subtitle: Feeling overwhelmed? You need a comprehensive strategy, not a butterfly net, to deal with data overload. I can assure you, I wasn’t waving butterfly nets all day.

The Google Effect on Cross-Language Search

May 31, 2007 / Leonor Ciarlone / 0 Comments

As the Internet continues to redefine ubiquitous, the issue of cross language search becomes more critical. It’s a pervasive challenge with extreme scalability requirements. Hard to imagine, but the Internet will be full by about 2010 according to the American Registry for Internet Numbers. ARIN’s recommendation for IPv6 demonstrates the potential breadth of information overload.

Organizations such as the European-based Cross-Language Evaluation Forum (CLEF) have moved beyond discussion and into in-depth testing on cross-language search for many years. With its “Leaping over Language Barriers” announcement, Google has moved beyond experimentation and toward productization of its cross-language search feature.

The Wall Street Journal’s Jessica Vascellaro weighs in here, and includes commentary on rival strategies from Yahoo and Microsoft.
Google Blogoscoped weighs in here.
Clay Tablet’s Ryan Coleman weighs in here.
Global by Design’s John Yunker has a review here.
And from Google themselves, here’s the beta UI, the FAQ, and the “unveiling” at the company’s Searchology event held earlier this month.

IMO, any discussion of what the interconnected world “looks like” in the future, whether focused on fill in your label here 2.0, social networking, customer experience, global elearning, etc., (should) eventually drill-down to translation and localization issues. Once we’re at that level of conversation, there’s more challenges to discuss — the ongoing evolution of automated translation, the balance between human and machine translation, the conundrum of rich media and image translation, and as Kaija will always remind us, the quality and context of search results as opposed to merely the quantity.

As a researcher, I’ve used Google’s “translate this” functionality and Yahoo’s Babel Fish (originally AltaVista’s) numerous times to “get the gist” of a non-English article. But my reliance on the results has been more for sanity-checking trends than for factual data gathering. Inconsistencies skew the truth. I just can’t trust it. Can we trust this? Time will tell. Is it a step in the right direction for the masses? No doubt.

Category: Enterprise search & search technology (Page 40 of 62)

Sharepoint and Search

The Marginal Influence of E-commerce Search and Taxonomies on Enterprise Search Technologies

Search and Need

Random Notes from the World on Search

Respect for Complexity and Security are Winners

Turbo Search Engines in Cars; it is not the whole solution

A Story About Structured Content and Search

The Google Effect on Cross-Language Search

Subscribe to the Gilbane Advisor

Choose Language

Topics we cover

Policies

Contact