Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 27 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Social Networking and Socializing: Difference Ways to Different Kinds of Knowledge

Having been mired for several weeks in a technological misalignment of the stars, I have to question how social tools (the technological kind) might have saved me boatloads of aggravation and time. Consider having all of these happen in one month:

  • Wireless router that couldn’t support wireless (waiting for second replacement)
  • IBM ThinkPad power adapter not usable with Lenovo ThinkPad
  • Cable service not able to get a signal from street down my 1,000 ft. driveway
  • Two cable modem failures and replacements
  • ISP spam blocker blocking good stuff but does not retain it as suspect mail for review
  • 10 hours of downtime from my web hosting/e-mail service provider

As one who guides and advises companies on enterprise search selection, implementation and deployment, and various aspects of knowledge asset management, it is a little ironic that I have my own challenges finding quality answers and knowledge to support my home office. I have used these tools in my search for answers:

  • Phone – vendor customer service
  • Chat – vendor website customer service
  • Email – vendor customer service, and to some colleagues for advice
  • Web searching – vendor site search, Internet general search engines
  • Twitter – comments about troubles; search for similar comments by others

So far, phone discussions have been the only pathway to resolutions, and in one case a technician’s house call was required. Most of the issues are still open, however emails and automated phone calls solicit feedback about my satisfaction with support services daily.

What does this have to do with search? I am searching to solve very specific problems, not an uncommon reason to search within the enterprise. As an independent consultant, my “enterprise” is my professional network, the support services I pay for and the WWW. When I fail to garner information I need from electronic sources, I reach out directly to experts in my personal network for answers. Even then, I find electronic dialog mechanisms that require typing a back-and-forth Q & A session to be pretty painful. Usually, one of us resorts to the phone or an in-person session to “see” what is really going on.

What have I learned?

  1. When a resolution is needed quickly and efficiently, talking to someone who is really an expert is the best path.
  2. When I can’t find the answer on-line, I need to find an expert.
  3. When I can’t find an answer or an expert, I flounder and waste huge amounts of time.

Conclusion:

Social tools (public platforms, social search, email, and even phone) require substantive work or communication skill by participants to establish a benefit from communication interchanges. Contextual hooks are needed to improve the results of information exchanges. Socializing is critical to expanding our networks of experts in a way that builds relationships in which we can freely reach out and expect a productive dialogue when we have a need to know. This is something to work at and consider when we embrace social technologies. It isn’t the technology tool that makes us social, it is the surrounding sharing and communicating (aka socializing) that breeds the trusting and trusted relationships that will improve our search for answers. Social networks and platforms may give us the tools to search for and share content. But it is the socializing that adds rich context to make it more likely that the expert we want and the answers we seek are the most beneficial.

W3C Publishes New Working Drafts for OWL 2 – Last Call

The W3C OWL Working Group has published new Working Drafts for OWL 2, a language for building Semantic Web ontologies. An ontology is a set of terms that a particular community finds useful for organizing data (e.g., for data about a book, useful terms include “title” and “author”). OWL 2 (a compatible extension of “OWL 1″ ) consists of 13 documents (7 technical, 4 instructional, and 2 group Notes). For descriptions and links to all the documents, see the ” OWL 2 Documentation Roadmap.” This is a “Last Call” for the technical materials and is an opportunity for the community to confirm that these documents satisfy requirements for an ontology language. This is a second Last Call for six of the documents, but because the changes since the first Last Call are limited in scope, the review period lasts only 21 days. For an introduction to OWL 2, see the four instructional documents: an “overview,” “primer,” “list of new features,” and “quick reference.” http://www.w3.org/2007/OWL, http://www.w3.org/TR/2009/WD-owl2-new-features-20090421/

Attensity, Empolis, and Living-e Merge

Three business intelligence and information management companies, Attensity Corporation, Empolis GmbH, and Living-e AG, announced they have united to form the Attensity Group to deliver business user applications that generate value from unstructured data. The company offers a comprehensive family of applications built upon deep expertise in semantic language technologies. Empolis GmbH provides information management applications, while Living-e AG offers intelligent multi-channel communication and information management solutions. Attensity Corporation is known for its deep text analytics software for First Person Intelligence. The new Attensity Group will unify these complementary technologies that analyze, interpret and manage an enterprise’s mass of unstructured data to generate value and deliver easy-to-use business applications. These applications enable knowledge management professionals, business leaders, customer support personnel and customers to get relevant and actionable answers. Attensity Group’s go-to-market entities will be named (1) Attensity Americas, and (2) Empolis in Europe, the Middle East and Africa (EMEA). In the markets in EMEA, the Living-e and Empolis teams will join forces. The companies have united their sales efforts to offer combined solutions, support and implementation services to the market. Attensity Group will employ approximately 300 people worldwide and expects to generate revenues in 2009 of approximately $50M (USD). With global headquarters in Palo Alto, Calif. Attensity Corporation, Empolis GmbH and Living-e AG have all supported the global open source initiative, SMILA (SeMantic Information Logistics Architecture), as semantic technology partners. Attensity Group will continue to partner with SMILA. http://www.attensitygroup.com

March Madness in the Search Industry

In keeping with conventional wisdom, it looks like a number of entrepreneurs are using the economic downturn as opportunity time, judging from the larger than normal number of announcements in the enterprise search sector. The Microsoft acquisition of FAST, Autonomy’s foray into the document/content management market, and Google’s Search Appliance ramping its customer base are old news BUT we have a sweep of changes. Newcomers to the enterprise search marketplace and news of innovative releases of mature products really perked up in March. Here are my favorite announcements and events in chronological order and the reasons why I find them interesting:

Travis, Paul. March 2, 2009 Digital Reef Comes Out of Stealth Mode. 03/02/2009. Byteandswitch.com.

Startup offers content management platform to index unstructured data for use in e-discovery, risk mitigation, and storage optimization. Here is the first evidence that entrepreneurs see opportunity for filling a niche vacuum. In the legal market the options have been limited and pretty costly, especially for small firms. This will be an interesting one to watch. http://www.digitalreefinc.com/

Banking, Finance, and Investment Taxonomy Now Available from the the Taxonomy Experts at WAND. 03/02/2009, PR Web (press release), Ferndale,WA,USA

The taxonomy experts at WAND have made this financial taxonomy available now for integration into any enterprise search software. I have been talking with Ross Lehr, CEO at Wand, for over a year about his suite of vertical market taxonomies and how best to leverage them. I am delighted that Wand is now actively engaged with a number of enterprise search and content management firms, enabling them to better support their customers’ need for navigation. The Wand taxonomies offer a launching point from which organizations can customize and enhance the vocabulary to match their internal or customer interests. http://www.wandinc.com/main/default.aspx

Miller, Mark. Lucid Imagination » Add our Lucene Ecosystem Search Engine to Firefox. 03/02/2009

I predicted back in January that open source search and search appliances were going to spawn a whole new industry of services providers and expert integrators because there are just not enough search experts to staff in-house experts in all the companies that are adopting these two types of search products. Well, it is happening and these guys at Lucid are some of the smartest search technologists around. Here is an announcement that introduces you to a taste of what they can do. Check it out and check them out at http://www.lucidimagination.com/

To see the full article with commentary about: social search at NASA, QueSearch, MaxxCat, Aardvark on social search, Attivio, ConceptSearching, Google user-group, Simplexo, Endeca, Linguamatics, Coveo, dtSearch and ISYS.

Microsharing has benefits for NASA. 03/04/2009.

It has been about 18 months since I wrote on social search and this report reveals a program that takes the concept to a new level, integrating content management, expertise locators and search in a nifty model. To learn more about NASAsphere, read this report written by Celeste Merryman. Findings from the NASAsphere Pilot. Jet Propulsion Laboratory, California Institute of Technology Knowledge Arciteture (sic) and Technology Task [Force]. 08/20/2008. The success of the pilot project is underscored in this report recommendation: the NASAsphere pilot team recommends that NASAsphere be implemented as an “official” employee social networking and communication tool. This project is not about enterprise search per se, it just reflects how leveraging content and human expertise using social networks requires a “findability” component to have a successful outcome. Conversely, social tools play a huge role in improving findability.

March 16, 2009. QueSearch: Unlocking the Value of Structured Data with Universal Search really caught my eye with their claim to “universal search” (yes, another) for large and mid-size organizations.

This offering with a starting price of $19,500, is available immediately, with software and appliance deployment options. I tried to find out more about their founders and origins on their Web site without luck but did track down a Wikipedia article and a neat YouTube interview with the two founders, Steven Yaskin and Paul Tenberg. It explains how they are leveraging Google tools and open source to deliver solutions.

Stronger, Better, Faster — MaxxCat’s New Search Appliance Aspires to Be Google Search Appliance Killer, by Marketwire. 03/11/2009.

This statement explains why the announcement caught my attention: MaxxCat product developers cite “poor performance and intrinsic limitations of Google Mini and Google Search Appliance” as the impetus to develop the device. The enterprise search appliance, EX-5000, is over seven times faster than Google Search Appliance (GSA) and the small business search appliance, the XB-250, is 16 times faster than Google Mini. There is nothing like challenging the leading search appliance company with a statement like that to throw down the gauntlet. OK I’m watching and will be delighted to read or hear from early users.

Just one more take on “social search” as we learn about Aardvark: Answering the Tough Questions, David Hornik on VentureBlog. 03/12/2009

This week the Aardvark team is launching the fruits of that labor at South By Southwest (SXSW). They have built a “social search engine” that lives inside your IM and email. It allows you to ask questions of Aardvark, which then goes about determining who among your friends and friends of friends is most qualified to answer those questions. As the Aardvark team point out in their blog, Social Search is particularly well suited to answer subjective questions where “context” is important. I am not going to quibble now but I think I would have but this under my category of “semantic search” and natural language processing. Until we see it in action, who knows?

A new position at Attivio was announced on March 16th, Attivio Promotes John O’Neil to Chief Scientist, which tells me that they are still expanding at the end of their first official year in business.

Getting to the point, 03/18/2009, KMWorld. http://www.kmworld.com/Articles/ReadArticle.aspx?ArticleID=53070

Several announcements about Concept Searching’s release v. 4 of its flagship product, conceptClassifier for SharePoint highlight the fact that Microsoft’s acquisition of FAST has not slowed the number of enterprise search solution companies that continue to partner with or offer independent solutions for SharePoint. In this case the company offers its own standalone concept search solution applications for other content domains but is continuing to bank on lots of business from the SharePoint user community. This relationship is reflected in these statements: The company says features include a new installer that enables installation in a SharePoint environment in less than 20 minutes, requires no programmatic support and all functionality can be turned on or off using standard Microsoft SharePoint controls. Full integration with Microsoft Content Types and greater support for multiple taxonomies are also included in this release. Once the FAST search server becomes a staple for Microsoft SharePoint shops, there will undoubtedly be fallout for some of these partners.

Being invited to the Google Enterprise Search Summit in Cambridge, MA on March 19, 2009 was an opportunity for me to visit Google’s local offices and meet a bunch of customers.

They were a pretty enthusiastic crowd and are enjoying a lot of attention as this division of Google works to join the ranks of other enterprise application software companies. I suspect that it is a whole new venture for them to be entertaining customers in their offices in a “user-group like” forum but the Google speakers were energetic and clearly love the entrepreneurial aspects of being a newish run-away success within a run-away successful company. New customer announcements continue to flow from Google with SITA (The State Information Technology Agency in South Africa) acquiring GSA to drive an enterprise-wide research project. The solution will also be deployed and implemented by JSE-listed IT solutions and services company Faritec, and RR Donnelly. Several EMC users were represented at the meeting, which made me ask why they aren’t using the search tools being rolled out by the Documentum division…well, don’t ask.

Evans, Steve. Simplexo boosts public sector search options. Computer Business Review – UK. 03/18/2009.

This is interesting as an alternative to the Lucene/solr scene, UK-based open source enterprise search vendor Simplexo has launched a new search platform aimed at the public sector, which aims to enable central and local government departments to simultaneously search multiple disparate data sources across the organisation on demand. I have wondered when we would see some other open source offerings.

And all of the preceding is about just the startups (plus EMC at Google) and lesser known company activity. This was not a slow month. I don’t want all my contacts in the “established” search market to think that I am not paying attention because I am. I’ve exchanged communications with or been briefed by these known companies with news about new releases, advancing market share, or new executive teams. In no particular order these were the highlights of the month:

Endeca announced three new platforms on Mar 23, 2009: Endeca Announces the Endeca Publishing Suite, Giving Editors Unprecedented Control Over the Online Experience; Endeca Announces the Endeca Commerce Suite, Giving Retailers Continuous Targeted Merchandizing; and Endeca Unveils McKinley Release of the Information Access Platform, Allowing for Faster and Easier Deployment of Search Applications

Linguamatics Agile Text Mining Platform to Be Used by Novo Nordisk. 03/26/2009

I had a fine briefing by Coveo’s CEO Laurent Simoneau and Michel Besmer new VP of Global Marketing and see them making great strides capturing market share across numerous verticals where rapid deployment and implementation are a big selling point. They also just announced: Bell Mobility and Coveo Partner to Create Enterprise Search from Bell, an Exclusive Enterprise-Grade Mobile Search Solution.

A new Version 7.6 of a mainstay, plug-and-play search solution for SMBs since 1991, dtSearch, was just released. 3/24/2009

And finally, ISYS is having a great growth path with a new technology release, ISYS File Readers, new executives and a new project … completed in conjunction with ArnoldIT.com. Steve Arnold, industry expert and author of the Beyond Search blog, compiled more than a decade of Google patent documents. To offer a more powerful method for analyzing and mining this content, we produced the Google Patent Search Demonstration Site, powered by our ISYS: web application.

Weatherwise, March, 2009 is out like a lamb but hot, hot, hot when it comes to search.

Why Copy Your Competitors Bad Choices? Search Can Work for You

I’ve often been curious about why companies frequently procure enterprise applications used by their competitors, destined to be followers instead of leaders. It seems to reflect a lack of imagination but, more importantly, a lack of confidence that one could select another solution with more possibilities for enhancing the organization’s competitiveness.
Look at three popular concepts about search:

  • The search box for keyword search is dead or only marginally useful
  • Professionals spend 10 – 20% of their workday searching (and often unsuccessfully)
  • Vast amounts of critical unstructured content is un-discoverable in most enterprises leaving organizations at risk in litigation, weak in leveraging fundamental knowledge and research for innovation, poor at customer support because known solutions can’t be found, and competitive intelligence is scarce to unearth because so much of it lies hidden in desktop email in-boxes.

If we accept these propositions, doesn’t it say something about the “leaders” in the search industry that we believe and accept so little from search?

Why do most organizations not try to solve at least one of these problems by seeking solutions that will save hundreds of thousands of dollars in wasted labor, litigation costs, R&D expense, or lost customers due to poor service? Why do companies seek to procure search applications from companies that have been around for a decade or more, licensing evolutionary products, not revolutionary ones? Why would a company ignore innovative new products in favor of products that have given “search” a bad reputation? Why do organizations make hundred thousand dollar, or more, procurements without expending a few hundred dollars on documented product comparisons, and instead rely on a few widely published charts with less than a page or two on each product?

Most important, why are organizations not seeking search applications that will give them an edge by uncovering a nugget that will get a product to market faster, help marketing groups position a product better against the competition, or give support services representatives superior tools for getting information back to customers instantly with a proven solution to a query? Where is the will to apply search technology more astutely than your competitors in every area of your business? Why is search not expected to perform flawlessly and be as ubiquitous as any other software tool in your workflow? It does not have to be a poor performing stepchild but it does require its own experts to be well executed. Come to think of it, I have never seen a help wanted posting requiring expertise in search technology implementation. Hmmm…

There are well over a hundred viable search applications and hundreds of other applications that have search embedded for point solutions. You may need to acquire, implement and maintain a number of products across the enterprise to realize all the benefits search can bring but these products can work together, just as other components of a well-run enterprise do. At a time when organizations are cutting employees, appropriate search solutions may just offset the loss of expertise by uncovering at least some of the lost assets left behind.

Federated Search: Setting User Expectations

In the past few months, it is rare that I am briefed on an enterprise search product without a claim to provide “federated search.” Having worked with the original ANSI standard, Z39.50, and on one of the many review committees for it back in the early 1990s, it is a topic that always catches my attention.

Some of the history of search federation is described in this rather sketchy article at Wikipedia. However, I want clarify the original call for such a standard. It comes from the days when public access to search technologies was available primarily through library on-line catalogs in pubic and academic institutional libraries. A demand for the ability to search not only one’s local library system and network (e.g. a university often standardized on one library system to include all the holdings of a number of its own libraries), but also the holdings of other universities or major public libraries. The problem was that the data structures and protocols from one library system product to the next varied in way that made it difficult for the search engine of the first system to penetrate the database of records in another system. Records might have been meta-tagged similarly, but the way the metadata were indexed and accessible to retrieval algorithms was not possible with a translating layer between systems. Thus, the Z39.50 standard was established, originally to let one library system’s user search from that library system into the contents of other libraries with different systems.

Ideally, results were presented to the searcher in a uniform citation format, organized to help the user easily recognize duplicated records, each marked with location and availability. Usually there was a very esoteric results presentation that could only be readily interpreted by librarians and research scholars.

Now we live in a digitized content environment in which the dissimilarities across content management systems, content repositories, publishers’ databases, and library catalogs have increased a hundred fold. The need for federating or translation layers to bring order to this metadata or metadata-less chaos has only become stronger. The ANSI standard is largely ignored by content platform vendors, thus leaving the federating solution to non-embedded search products. A buyer of search must do deep testing to determine if the enterprise search engine you have acquired actually stands up well under a load of retrieving across numerous disparate repositories. And you need a very astute and experienced searcher with expert familiarity of content in all the repositories to make an evaluation as to suitability for the circumstance in which the engine will be used.

So, let’s just recap what you need to know before you select and license a product claiming to support what you expect from search federation:

  • Federated search is a process for retrieving content either serially or concurrently from multiple targeted sources that are indexed separately, and then presenting results in a unified display. You can imagine that there will be a huge variation in how well those claims might be satisfied.
  • Federation is an expansion of the concept of content aggregation. It has play in a multi-domain environment of only internal sites OR a mix of internal and external sites that might include the deep (hidden) web. Across multiple domains complete federation supports at least four distinct functions:
    • Integration of the results from a number of targeted searchable domains, each with its own search engine
    • Disambiguation of content results when similar but non-identical pieces of content might be included
    • Normalization of search results so that content from different domains is presented similarly
    • Consolidation of the search operation (standardizing a query to each of the target search engines) and standardizing the results so they appear to be coming from a single search operation

In order to do this effectively and cleanly, the federating layer of software, which probably comes from a third-party like MuseGlobal, must have “connectors” that recognize the structures of all the repositories that will be targeted from the “home” search engine.

Why is this relevant? In short, because it is expected by users that when they search, all the results they are looking at represent all the content from all the repositories they believed they were searching in a format that makes sense to them. It is a very tall order for any search system to do this but when enterprise information managers are trying to meet a business manager’s or executive’s lofty expectations, anything less is viewed as the failure of enterprise search. Or else, they better set expectations lower.

From the FastForward Blogger: A Microsoft User Group Meeting

I was at FastForward last week, invited to be a participant in a panel of bloggers on the last day, tasked to react to three days of executive, partner and customer presentations to the FAST search user community. Four of us had more ideas than we could share in a 30 minute panel session. The other three fellows on the panel are regular bloggers on FastForward. Along with them, I had the pleasure of listening to and speaking with numerous other industry analysts and commentators over the three-day period in the “blogger/analyst lounge” where we gathered between sessions.

Before making some observations of my own, I will introduce you to a few of the folks who have had and will continue to have a continuing presence in the content and search arena, particularly as it relates to social tools and knowledge management, two tightly connected areas of interest.

Each of us was interviewed for a kind of video blog session during the meeting. Although you can’t view the panel from the final keynote session, I can share these links that will give you an idea of what my cohorts were thinking about the meeting and the state of FastForward in 2009. They are:

  • Jon Husband, social computing thought leader and architect. He has coined the term “wirearchy,” which aptly describes a flow of connectedness over the wires (and wireless) air waves. I really liked his observations about how social technologies encourage self-organizing around issues and make group action so much easier. His interview is a good listen and his blog is fun, too.
  • Jevon MacDonald, founder of Firestoker and FASTforward blog contributor, had some helpful comments in his interview about the usefulness of social media in aiding companies to be more responsive to their customers.
  • Euan Semple, independent advisor on social computing, elevated the discussion in favor of social tools improving the flow of knowledge, which is really the point of all this content and search related technology, as far as I am concerned. You’ll enjoy the interview with Euan in which he also comments on the ratio of men to women and the IT-centric audience at the meeting, something I observed, as well.
  • I was also interviewed by Josh-Michéle Ross and my thoughts dovetailed with the others in keeping with the social them of “engage your user,” the conference tag line. My mantra throughout the conference and after is that there was just not enough emphasis on how teams work together to build highly functional and easy-flowing search experiences for users. The process of creating a social platform in which search is present in subtle ways that assist connectedness among experts and their content requires human design; this is an art that can’t be left to “out-of-the-box” installed technologies. It is a task for those with an aptitude for what users really want, need and will use without being force-fed or artificially manipulated. Here are my comments in the interview.

Other interviews of interest can be found at the FastForward Bloggers page where a lot of thought leaders including Rob Paterson, Bill Ives, Clay Shirky, Charlene Li and Jim McGee among many others put forth some thoughtful comments about the state of technology.

Our panel moderator was, Perry Solomon, VP Business Development and General Manager, Worldwide Media Solutions – FAST. While on the big stage we did not get to speak on all the ideas he asked about in our preparatory session, I can bring them to light in the following. Solomon asked these questions followed by my thoughts after a few days to digest the meeting:

Q: How was the meeting balance in terms of search technology versus use?
LWM: The use cases were compelling and well presented. They were highly evocative of the best applications we can achieve with technology using all the social tools and content management options now available. This is appropriate in keynote/big theatre presentations but what I did not find in the few breakout sessions was more about the “nuts and bolts” of the human design and understanding needed to integrate components.

Among the attendees that I met during meals (system integrator partners from small firms and IT people who were struggling to build applications their internal customers wanted), there was a sense that not enough substantive information was being shared. They had hoped for more “how to” and concrete case studies that described the process of getting from purchasing licenses to deploying solutions. When I suggested to some of these Microsoft customers that it might be helpful to have more of their content managers and search administrators in the audience, they all agreed. None carried an attitude that they were going to design and implement these highly sophisticated content/search solutions with just members of the IT department. Business users were also notably absent from the meeting.

Q: What was the impact of the announcement about product news, FastSearch for Internet Business and FastSearch for SharePoint?
LWM: My own reaction was that it was a logical way to begin to roll out the FAST product with existing and evolving Microsoft products. It was not surprising, revolutionary or exciting. MS is clearly committed to making something of its huge investment in FAST; to align it with the rapidly evolving and highly popular SharePoint is smart business. The sentiment of others I spoke with was pretty much the same, sprinkled with a fair amount of skepticism about schedules for delivery and how well the products will be supported with services and documentation. Cost of ownership is always a big worry; what it will take to get the sizzle and super search results from this technology without a huge amount of human investment and skill on the part of customers or third-party integrators stimulates a deep “wait-and-see” attitude among most.

Q: What was missing or not addressed in the sessions?
LWM: The lack of presentations and involvement of non-IT people. While MS is highly responsive to the IT person’s desire for standardizing on a full-function platform and set of tools from a single supplier, this is not the reality in the marketplace. Content is created, manipulated and re-purposed with hundreds of applications that are used by business owners and content managers who bring a deep understanding of what needs to be applied to get the “social” workflow operational and productive in any given culture. My own bias is that the subtleties of organizational culture are often lost on many in IT but are more understood by those deeply immersed in engagement with both experts and their content. A “user-group” meeting must include these “others” and have sessions that support their professional interests so they come away learning substantive stuff from those others in similar situations.

Although “search” was the nominal reason for the meeting, there was no discussion about what it takes to get to the ultimate “user-engagement.” Search remains, “smoke and mirrors.” Search behind the firewall was still pretty thin as a concept and the emphasis was on e-commerce and monetization. There was a lot of talk about business & customer experiences engaging with search but not much substance as to how to actually create rich search experiences.

Q: What are we going to be talking about a year from now?
LWM: I hope the engagement will provide less visionary “hype,” which is not real high-value for the audience in heavy doses. If the meeting becomes more about getting customers to a successful outcome through the engagement of teams with IT, developers, content and business owners coming to a problem using a thoughtful design approach, attendees will leave with a higher commitment to embrace the technology.

Finally, I believe that, as FastSearch solutions are implemented and tested, customers will come to these meetings with higher expectations for helpful case studies that talk about “how the sausage is made,” the role of connectors and the actual tuning for higher relevancy. Much reference to search federation will give way to what federation really is and its many tiers of sophistication. Presentation of search results in ways that are compelling and trustworthy for users will need to be explained in more substantive sessions. I hope that we will be talking about social team interaction for implementing compelling search technology experiences for users.

Native Database Search vs. Commercial Search Engines

This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.

The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.

To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today’s content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records.

This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean “and” or “or” query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.

So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.

Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:

  • Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.
  • Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.
  • Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.
  • Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.
  • Social features that support interaction among users and personalization options must be built.
  • Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.

Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑