Curated for content, computing, and digital experience professionals

Tag: semantic search (Page 1 of 2)

Speaker Spotlight: John Felahi – Making content findable

In another installment of Speaker Spotlight, we posed a couple of our frequently asked questions to speaker John Felahi, Chief Strategy Officer at Content Analyst Company, LLC. We’ve included his answers here. Be sure to see additional Speaker Spotlights from our upcoming conference.


Speaker Spotlight: John Felahi

Chief Strategy Officer

Content Analyst Company, LLC


What is the best overall strategy for delivering content to web, multiple mobile, and upcoming digital channels? What is the biggest challenge? Development and maintenance cost? Content control? Brand management? Technology expertise?

One of the biggest challenges to delivering content to the web is making it as findable as possible to potential interested viewers.  While traditional, manual tagging and keyword search methods may have gotten us this far, and may be good enough for some use cases, they’re still not without limitations. The good news is, there are far more advanced, sophisticated – and automated – technologies available to remedy the numerous limitations of manual tagging content and keyword-based search. The limitations of manual tagging and keyword-based include:

  • Term creep – New terms constantly emerge, requiring taxonomies to be constantly updated.
  • Polysemy – Take Apple, for example. Is your user searching for the company, the Beatles’ record label, or the fruit?
  • Acronyms – Texting has introduced an entirely new language of acronyms (LOL, TTYL, WDYT).  Manually tagging content requires the editor to consider possible acronyms the users will be searching for.
  • Abbreviations – Tagging content with long, scientific terms, geographies, etc. require editors to factor these in along with the long terms they represent.
  • Misspellings – Thanks to spellcheck and autocorrect, technology has become much more forgiving for those who never made it past the first round eliminations in their sixth grade spelling bee. Content search, unfortunately, needs to be equally accommodating, if you want your users to find your content – which means tagging it with common misspellings.
  • Language – The web has certainly made the world a much smaller place, but that doesn’t mean everyone speaks English.  Making content findable in any language means it has to also be tagged in multiple languages.

On to the good news – there’s technology that’s been used for years in eDiscovery and the US Intelligence Community to overcome these very challenges, but for different reasons. Because the bad guys aren’t tagging their content to make it more findable, the intel community needs a better way to find what they’re looking for. And in eDiscovery, finding relevant content can make a multi-million dollar difference to the outcome of a particular litigation or other regulatory matter. That’s why tens of thousands of legal reviewers and countless analysts in the intel community use a technology known as concept-aware advanced analytics.

How concept-aware advanced analytics differs from manual tagging and keyword search

As its name implies, concept-aware understands the underlying concepts within the content. As such, it can tag content automatically.  On the viewer’s side, content can be found by simply saying, “find more like this.” Categories are defined by taking examples that represent the concepts of a category. The system “learns” what that category is all about, and can then identify conceptually similar content and apply the same category. The process is the same on the search side. The user points to a piece of content and says, “find more like this.” Or as the content publisher, you present the viewer with conceptually similar content, i.e., “you may also be interested in these articles.”

While concept-aware advanced analytics doesn’t necessarily replace manual tagging and keyword search – which work very well in certain situations – the technology clearly overcomes many of the limitations of traditional tagging and search methods.

Catch Up with John at Gilbane

Track E: Content, Collaboration, and the Employee Experience

E7: Strategic Imperatives for Enterprise Search to Succeed
Wednesday, December, 4: 2:00 p.m. – 3:20 p.m.

[button link=”” variation=”red”]Complete Program[/button] [button link=”” variation=”red”]Conference Schedule[/button] [button link=”” variation=”red”]Register Today[/button]

ETL and Building Intelligence Behind Semantic Search

A recent inquiry about a position requiring ETL (Extraction/Transformation/Loading) experience prompted me to survey the job market in this area. It was quite a surprise to see that there are many technical positions seeking this expertise, plus experience with SQL databases, and XML, mostly in healthcare, finance or with data warehouses. I am also observing an uptick in contract positions for metadata and taxonomy development.

My research on Semantic Software Technologies placed me on a path for reporters and bloggers to seek my thoughts on the Watson-Jeopardy story. Much has been written on the story but I wanted to try a fresh take on the meaning of it all. There is a connection to be made between the ETL field and building a knowledgebase with the smarts of Watson. Inspiration for innovation can be drawn from the Watson technology but there is a caveat; it involves the expenditure of serious mental and computing perspiration.

Besides baked-in intelligence for answering human questions using natural language processing (NLP) to search, an answer-platform like Watson requires tons of data. Also, data must be assembled in conceptually and contextually relevant databases for good answers to occur. When documents and other forms of electronic content are fed to a knowledgebase for semantic retrieval, finely crafted metadata (data describing the content) and excellent vocabulary control add enormous value. These two content enhancers, metadata and controlled vocabularies, can transform good search into excellent search.

The irony of current enterprise search is that information is in such abundance that it overwhelms rather than helps findability. Content and knowledge managers can’t possibly contribute the human resources needed to generate high quality metadata for everything in sight. But there are numerous techniques and technologies to supplement their work by explicitly exploiting the mountain of information.

Good content and knowledge managers know where to find top quality content but may not know that, for all common content formats, there are tools to extract key metadata embedded (but hidden) in it. Some of these tools can also text mine and analyze the content for additional intelligent descriptive data. When content collections are very large but too small to justify (under a million documents) the most sophisticated and complex semantic search engines, ETL tools can relieve pressure on metadata managers by automating a lot of mining, extracting entities and concepts needed for good categorization.

The ETL tool array is large and varied. Platform tools from Microsoft (SSIS) and IBM (DataStage) may be employed to extract, transform and load existing metadata. Other independent products such as those from Pervasive and SEAL may contribute value across a variety of platforms or functional areas from which content can be dramatically enhanced for better tagging and indexing. The call for ETL experts is usually expressed in terms of engineering functions who would be selecting, installing and implementing these products. However, it has to be stressed that subject and content experts are required to work with engineers. The role of the latter is to help tune and validate the extraction and transformation outcomes, making sure terminology fits function.

Entity extraction is one major outcome of text mining to support business analytics, but tools can do a lot more to put intelligence into play for semantic applications. Tools that act as filters and statistical analyzers of text data warehouses will help reveal terminology for use in building specialized controlled vocabularies for use in auto-categorization. A few vendors that are currently on my radar to help enterprises understand and leverage their content landscape include EntropySoft Content ETL, Information Extraction Systems, Intelligenx, ISYS Document Filters, RAMP, and XBS, something here for everyone.

The diversity of emerging applications is a leading indicator that there is a lot of innovation to come with all aspects of ETL. While RAMP is making headway with video, another firm with a local connection is Inforbix. I spoke with a co-founder, Oleg Shilovitsky for my semantic technology research last year before they launched. As he then asserted, it is critical to preserve, mine and leverage the data associated with design and manufacturing operations. This area has huge growth potential and Inforbix is now ready to address that market.

Readers who seek to leverage ETL and text mining will gain know-how from the cases presented at the 2011 Text Analytics Summit, May 18-19 in Boston. As well, the exhibits will feature products to consider for making piles of data a valuable knowledge asset. I’ll be interviewing experts who are speaking and exhibiting at that conference for a future piece. I hope readers will attend and seek me out to talk about your metadata management and text mining challenges. This will feed ideas for future posts.

Finally, I’m not the only one thinking along these lines. You will find other ideas and a nudge to action in these articles.

Boeri, Bob. Improving Findability Behind the Firewall, 28 slides. Enterprise Search Summit 2010, NY, 05/2010.
Farrell, Vickie. The Need for Active Metadata Integration: The Hard Boiled Truth. DM Direct Newsletter, 09/09/2005, 3p
McCreary, Dan. Entity Extraction and the Semantic Web, Semantic Universe, 01/01/2009
White, David. BI or bust? KMWorld, 10/28/2009, 3p.

How Far Does Semantic Software Really Go?

A discussion that began with a graduate scholar at George Washington University in November, 2010 about semantic software technologies prompted him to follow up with some questions for clarification from me. With his permission, I am sharing three questions from Evan Faber and the gist of my comments to him. At the heart of the conversation we all need to keep having is, how far does this technology go and does it really bring us any gains in retrieving information?

1. Have AI or semantic software demonstrated any capability to ask new and interesting questions about the relationships among information that they process?

In several recent presentations and the Gilbane Group study on Semantic Software Technologies, I share a simple diagram of the nominal setup for the relationship of content to search and the semantic core, namely a set of terminology rules or terminology with relationships. Semantic search operates best when it focuses on a topical domain of knowledge. The language that defines that domain may range from simple to complex, broad or narrow, deep or shallow. The language may be applied to the task of semantic search from a taxonomy (usually shallow and simple), a set of language rules (numbering thousands to millions) or from an ontology of concepts to a semantic net with millions of terms and relationships among concepts.

The question Evan asks is a good one with a simple answer, “Not without configuration.” The configuration needs human work in two regions:

  • Management of the linguistic rules or ontology
  • Design of search engine indexing and retrieval mechanisms

When a semantic search engine indexes content for natural language retrieval, it looks to the rules or semantic nets to find concepts that match those in the content. When it finds concepts in the content with no equivalent language in the semantic net, it must find a way to understand where the concepts belong in the ontological framework. This discovery process for clarification, disambiguation, contextual relevance, perspective, meaning or tone is best accompanied with an interface making it easy for a human curator or editor to update or expand the ontology. A subject matter expert is required for specialized topics. Through a process of automated indexing that both categorizes and exposes problem areas, the semantic engine becomes a search engine and a questioning engine.

The entire process is highly iterative. In a sense, the software is asking the questions: “What is this?”, “How does it relate to the things we already know about?”, “How is the language being used in this context?” and so on.

2. In other words, once they [the software] have established relationships among data, can they use that finding to proceed – without human intervention- to seek new relationships?

Yes, in the manner described for the previous question. It is important to recognize that the original set of rules, ontologies, or semantic nets that are being applied were crafted by human beings with subject matter expertise. It is unrealistic to think that any team of experts would be able to know or anticipate every use of the human language to codify it in advance for total accuracy. The term AI is, for this reason, a misnomer because the algorithms are not thinking; they are only looking up “known-knowns” and applying them. The art of the software is in recognizing when something cannot be discerned or clearly understood; then the concept (in context) is presented for the expert to “teach” the software what to do with the information.

State-of-the-art software will have a back-end process for enabling implementer/administrators to use the results of search (direct commentary from users or indirectly by analyzing search logs) to discover where language has been misunderstood as evidenced by invalid results. Over time, more passes to update linguistic definitions, grammar rules, and concept relationships will continue to refine and improve the accuracy and comprehensiveness of search results.

3. It occurs to me that the key value added of semantic technologies to decision-making is their capacity to link sources by context and meaning, which increases situational awareness and decision space. But can they probe further on their own?

Good point on the value and in a sense, yes, they can. Through extensive algorithmic operations, instructions can be embedded (and probably are for high-value situations like intelligence work), instructing the software what to do with newly discovered concepts. Instructions might then place these new discoveries into categories of relevance, importance, or associations. It would not be unreasonable to then pass documents with confounding information off to other semantic tools for further examination. Again, without human analysis along the continuum and at the end point, no certainty about the validity of the software’s decision-making can be asserted.

I can hypothesize a case in which a corpus of content contains random documents in foreign languages. From my research, I know that some of the semantic packages have semantic nets in multiple languages. If the corpus contains material in English, French, German and Arabic, these materials might be sorted and routed off to four different software applications. Each batch would be subject to further linguistic analysis, followed by indexing with some middleware applied to the returned results for normalization, and final consolidation into a unified index. Does this exist in the real world now? Probably there are variants but it would take more research to find the cases, and they may be subject to restrictions that would require the correct clearances.

Discussions with experts who have actually employed enterprise specific semantic software, underscores the need for subject expertise, and some computational linguistics training coupled with an aptitude for creative inquiry. These scientists informed me that individuals, who are highly multi-disciplinary and facile with electronic games and tools, did the best job of interacting with the software and getting excellent results. Tuning and configuration over time by the right human players is still a fundamental requirement.

Data Mining for Energy Independence

Mining content for facts and information relationships is a focal point of many semantic technologies. Among the text analytics tools are those for mining content in order to process it for further analysis and understanding, and indexing for semantic search. This will move enterprise search to a new level of research possibilities.

Research for a forthcoming Gilbane report on semantic software technologies turned up numerous applications used in the life sciences and publishing. Neither semantic technologies nor text mining are mentioned in this recent article Rare Sharing of Data Leads to Progress on Alzheimer’s in the New York Times but I am pretty certain that these technologies had some role in enabling scientists to discover new data relationships and synthesize new ideas about Alzheimer’s biomarkers. The sheer volume of data from all the referenced data sources demands computational methods to distill and analyze.

One vertical industry poised for potential growth of semantic technologies is the energy field. It is a special interest of mine because it is a topical area in which I worked as a subject indexer and searcher early in my career. Beginning with the 1st energy crisis, oil embargo of the mid-1970s, I worked in research organizations that involved both fossil fuel exploration and production, and alternative energy development.

A hallmark of technical exploratory and discovery work is the time gaps between breakthroughs; there are often significant plateaus between major developments. This happens if research reaches a point that an enabling technology is not available or commercially viable to move to the next milestone of development. I observed that the starting point in the quest for innovative energy technologies often began with decades-old research that stopped before commercialization.

Building on what we have already discovered, invented or learned is one key to success for many “new” breakthroughs. Looking at old research from a new perspective to lower costs or improve efficiency for such things as photovoltaic materials or electrochemical cells (batteries) is what excellent companies do.
How does this relate to semantic software technologies and data mining? We need to begin with content that was generated by research in the last century; much of this is just now being made electronic. Even so, most of the conversion from paper, or micro formats like fîche, is to image formats. In order to make the full transition to enable data mining, content must be further enhanced through optical character recognition (OCR). This will put it into a form that can be semantically parsed, analyzed and explored for facts and new relationships among data elements.

Processing of old materials is neither easy nor inexpensive. There are government agencies, consortia, associations, and partnerships of various types of institutions that often serve as a springboard for making legacy knowledge assets electronically available. A great first step would be having DOE and some energy industry leaders collaborating on this activity.

A future of potential man-made disasters, even when knowledge exists to prevent them, is not a foregone conclusion. Intellectually, we know that energy independence is prudent, economically and socially mandatory for all types of stability. We have decades of information and knowledge assets in energy related fields (e.g. chemistry, materials science, geology, and engineering) that semantic technologies can leverage to move us toward a future of energy independence. Finding nuggets of old information in unexpected relationships to content from previously disconnected sources is a role for semantic search that can stimulate new ideas and technical research.

A beginning is a serious program of content conversion capped off with use of semantic search tools to aid the process of discovery and development. It is high time to put our knowledge to work with state-of-the-art semantic software tools and by committing human and collaborative resources to the effort. Coupling our knowledge assets of the past with the ingenuity of the present we can achieve energy advances using semantic technologies already embraced by the life sciences.

Leveraging Language in Enterprise Search Deployments

It is not news that enterprise search has been relegated to the long list of failed technologies by some. We are at the point where many analysts and business writers have called for a moratorium on the use of the term. Having worked in a number of markets and functional areas (knowledge management/KM, special libraries, and integrated library software systems) that suffered the death knell, even while continuing to exist, I take these pronouncements as a game of sorts.

Yes, we have seen the demise of vinyl phonograph records, cassette tapes and probably soon musical CD albums, but those are explicit devices and formats. When you can’t buy or play them any longer, except in a museum or collector’s garage, they are pretty dead in the marketplace. This is not true of search in the enterprise, behind the firewall, or wherever it needs to function for business purposes. People have always needed to find “stuff” to do their work. KM methods and processes, special libraries and integrated library systems still exist, even as they were re-labeled for PR and marketing purposes.

What is happening to search in the enterprise is that it is finding its purpose, or more precisely its hundreds of purposes. It is not a monolithic software product, a one-size-fits-all. It comes in dozens of packages, models, and price ranges. It may be embedded in other software or standalone. It may be procured for a point solution to support retrieval of content for one business unit operating in a very narrow topical range, or it may be selected to give access to a broad range of documents that exist in numerous enterprise domains on many subjects.

Large enterprises typically have numerous search solutions in operation, implementation, and testing, all at the same time. They are discovering how to deploy and leverage search systems and they are refining their use cases based on what they learn incrementally through their many implementations. Teams of search experts are typically involved in selecting, deploying and maintaining these applications based on their subject expertise and growing understanding of what various search engines can do and how they operate.

After years of hearing about “the semantic Web,” the long sought after “holy grail” of Web search, there is a serious ramping of technology solutions. Most of these applications can also make search more semantically relevant behind the firewall. These technologies have been evolving for decades beginning with so-called artificial intelligence, and now supported by some categories of computational linguistics such as specific algorithms for parsing content and disambiguating terms. A soon to-be released study featuring some of noteworthy applications reveals just how much is being done in enterprises for specific business purposes.

With this “teaser” on what is about to be published, I leave you with one important thought, meaningful search technologies depend on rich linguistically-based technologies. Without a cornucopia of software tools to build terminology maps and dictionaries, analyze content linguistically in context to elicit meaning, parse and evaluate unstructured text data sources, and manage vocabularies of ever more complex topical domains, semantic search could not exist.

Language complexities are challenging and even vexing. Enterprises will be finding solutions to leverage what they know only when they put human resources into play to work with the lingo of their most valuable domains.

Search Engines – Architecture Meets Adoption

Trying to summarize a technology space as varied as that covered in two days at the Search Engines Meeting in Boston, April 26-27, is a challenge and opportunity. Avoiding the challenge of trying to represent the full spectrum, I’ll stick with the opportunity. Telling you that search is everywhere, in every technology we use and has a multitude of cousins and affiliated companion technologies is important.

The Gilbane Group focuses on content technologies. In its early history this included Web content management, document management, and CMS systems for publishers and enterprises. We now track related technologies expanding to areas including standards like DITA and XML, adoption of social tools, plus rapid growth in the drive to localize and globalize content; Gilbane has kept up with these trends.

My area, search and more specifically “enterprise search” or search “behind the firewall,” was added just over three years ago. It seemed logical to give attention to the principal reason for creating, managing and manipulating content, namely finding it. When I pay attention to search engines, I am also thinking about adjoining content technologies. My recent interest is helping readers learn about how technology on both the search side and content management/manipulation side need better context; that means relating the two.

If one theme ran consistently through all the talks at Enterprise Search Meeting, it was the need to define search in relationship to so many other content technologies. The speakers, for the most part, did a fine job of making these connections.

Here are just some snippets:

Bipin Patel CIO of ProQuest, shared the technology challenges of maintaining a 24/7 service while driving improvements to the search usability interface. The goal is to deliver command line search precision to users who do not have the expertise to (or patience) to construct elaborate queries. Balancing the tension between expert searchers (usually librarians) with everyone else who seeks content underscores the importance of human factors. My take-away: underlying algorithms and architecture are worth little if usability is neglected.

Martin Baumgartel spoke on the Theseus project for the semantic search marketplace, a European collaborative initiative. An interesting point for me is their use of SMILA (SeMantic Information Logistics Architecture) from Eclipse. By following some links on the Eclipse site I found this interesting presentation from the International Theseus Convention in 2009. The application of this framework model underscores the interdependency of many semantically related technologies to improve search.

Tamas Doszkocs of the National Library of Medicine told a well-annotated story of the decades of search and content enhancement technologies that are evolving to contribute to semantically richer search experiences. His metaphors in the evolutionary process were fun and spot-on at a very practical level: Libraries as knowledge bases > Librarians as search engines > the Web as the knowledge base > Search engines as librarians > moving toward understanding, content, context, and people to bring us semantic search. A similar presentation is posted on the Web.

David Evans noted that there is currently no rigorous evaluation methodology yet for mobile search but is it very different than what we do with desktop search. One slide that I found most interesting was the Human Language Technologies (HLT) that contribute to a richer mobile search experience, essentially numerous semantic tools. Again, this underscores that the challenges of integrating sophisticated hardware, networking and search engine architectures for mobile search are just a piece of the solution. Adoption will depend on tools that enhance content findability and usability.

Jeff Fried of Microsoft/Fast talked about “social search” and put forth this important theme: that people like to connect to content through other people. He made me recognize how social tools are teaching us that the richness of this experience is a self-reinforcing mechanism toward “the best way to search.” It has lessons for enterprises as they struggle to adopt social tools in mindful ways in tandem with improving search experiences.

Shekhar Pradhan of Docunexus shared this relevant thought about a failure of interface architecture and that is (to paraphrase): the ubiquitous search box fails because it does not demand context or mechanisms for resolving ambiguity. Obviously, this breaks down adoption for enterprise search when it is the only option offered.

Many more talks from this meeting will get rolled up in future reports and blogs.

I want to learn your experiences and observations about semantic search and semantic technologies, as well. Please note that we have posted a brief survey for a short time at: Semantic Technology Survey. If you have any involvement with semantic technologies, please take it.

Layering Technologies to Support the Enterprise with Semantic Search

Semantic search is a composite beast like many enterprise software applications. Most packages are made up of multiple technology components and often from multiple vendors. This raises some interesting thoughts as we prepare for Gilbane Boston 2009 to be held this week.

As part of a panel on semantic search, moderated by Hadley Reynolds of IDC, with Jeff Fried of Microsoft and Chris Lamb of the OpenCalais Initiative at Thomson Reuters, I wanted to give a high level view of semantic technologies currently in the marketplace. I contacted about a dozen vendors and selected six to highlight for the variety of semantic search offerings and business models.

One case study involves three vendors, each with a piece of the ultimate, customer-facing, product. My research took me to one company that I had reviewed a couple of years ago, and they sent me to their “customer” and to the customer’s customer. It took me a couple of conversations and emails to sort out the connections; in the end the relationships made perfect sense.

On one hand we have conglomerate software companies offering “solutions” to every imaginable enterprise business need. On the other, we see very unique, specialized point solutions to universal business problems with multiple dimensions and twists. Teaming by vendors, each with a solution to one dimension of a need, create compound product offerings that are adding up to a very large semantic search marketplace.

Consider an example of data gathering by a professional services firm. Let’s assume that my company has tens of thousands of documents collected in the course of research for many clients over many years. Researchers may move on to greater responsibility or other firms, leaving content unorganized except around confidential work for individual clients. We now want to exploit this corpus of content to create new products or services for various vertical markets. To understand what we have, we need to mine the content for themes and concepts.

The product of the mining exercise may have multiple uses: help us create a taxonomy of controlled terms, preparing a navigation scheme for a content portal, providing a feed to some business or text analytics tools that will help us create visual objects reflecting various configurations of content. A text mining vendor may be great at the mining aspect while other firms have better tools for analyzing, organizing and re-shaping the output.

Doing business with two or three vendors, experts in their own niches, may help us reach a conclusion about what to do with our information-rich pile of documents much faster. A multi-faceted approach can be a good way to bring a product or service to market more quickly than if we struggle with generic products from just one company.

When partners each have something of value to contribute, together they offer the benefits of the best of all options. This results in a new problem for businesses looking for the best in each area, namely, vendor relationship management. But it also saves organizations from dealing with huge firms offering many acquired products that have to be managed through a single point of contact, a generalist in everything and a specialist in nothing. Either way, you have to manage the players and how the components are going to work for you.

I really like what I see, semantic technology companies partnering with each other to give good-to-great solutions for all kinds of innovative applications. By the way, at the conference I am doing a quick snapshot on each: Cogito, Connotate (with Cormine and WorldTech), Lexalytics, Linguamatics, Sinequa and TEMIS.

March Madness in the Search Industry

In keeping with conventional wisdom, it looks like a number of entrepreneurs are using the economic downturn as opportunity time, judging from the larger than normal number of announcements in the enterprise search sector. The Microsoft acquisition of FAST, Autonomy’s foray into the document/content management market, and Google’s Search Appliance ramping its customer base are old news BUT we have a sweep of changes. Newcomers to the enterprise search marketplace and news of innovative releases of mature products really perked up in March. Here are my favorite announcements and events in chronological order and the reasons why I find them interesting:

Travis, Paul. March 2, 2009 Digital Reef Comes Out of Stealth Mode. 03/02/2009.

Startup offers content management platform to index unstructured data for use in e-discovery, risk mitigation, and storage optimization. Here is the first evidence that entrepreneurs see opportunity for filling a niche vacuum. In the legal market the options have been limited and pretty costly, especially for small firms. This will be an interesting one to watch.

Banking, Finance, and Investment Taxonomy Now Available from the the Taxonomy Experts at WAND. 03/02/2009, PR Web (press release), Ferndale,WA,USA

The taxonomy experts at WAND have made this financial taxonomy available now for integration into any enterprise search software. I have been talking with Ross Lehr, CEO at Wand, for over a year about his suite of vertical market taxonomies and how best to leverage them. I am delighted that Wand is now actively engaged with a number of enterprise search and content management firms, enabling them to better support their customers’ need for navigation. The Wand taxonomies offer a launching point from which organizations can customize and enhance the vocabulary to match their internal or customer interests.

Miller, Mark. Lucid Imagination » Add our Lucene Ecosystem Search Engine to Firefox. 03/02/2009

I predicted back in January that open source search and search appliances were going to spawn a whole new industry of services providers and expert integrators because there are just not enough search experts to staff in-house experts in all the companies that are adopting these two types of search products. Well, it is happening and these guys at Lucid are some of the smartest search technologists around. Here is an announcement that introduces you to a taste of what they can do. Check it out and check them out at

To see the full article with commentary about: social search at NASA, QueSearch, MaxxCat, Aardvark on social search, Attivio, ConceptSearching, Google user-group, Simplexo, Endeca, Linguamatics, Coveo, dtSearch and ISYS.

Microsharing has benefits for NASA. 03/04/2009.

It has been about 18 months since I wrote on social search and this report reveals a program that takes the concept to a new level, integrating content management, expertise locators and search in a nifty model. To learn more about NASAsphere, read this report written by Celeste Merryman. Findings from the NASAsphere Pilot. Jet Propulsion Laboratory, California Institute of Technology Knowledge Arciteture (sic) and Technology Task [Force]. 08/20/2008. The success of the pilot project is underscored in this report recommendation: the NASAsphere pilot team recommends that NASAsphere be implemented as an “official” employee social networking and communication tool. This project is not about enterprise search per se, it just reflects how leveraging content and human expertise using social networks requires a “findability” component to have a successful outcome. Conversely, social tools play a huge role in improving findability.

March 16, 2009. QueSearch: Unlocking the Value of Structured Data with Universal Search really caught my eye with their claim to “universal search” (yes, another) for large and mid-size organizations.

This offering with a starting price of $19,500, is available immediately, with software and appliance deployment options. I tried to find out more about their founders and origins on their Web site without luck but did track down a Wikipedia article and a neat YouTube interview with the two founders, Steven Yaskin and Paul Tenberg. It explains how they are leveraging Google tools and open source to deliver solutions.

Stronger, Better, Faster — MaxxCat’s New Search Appliance Aspires to Be Google Search Appliance Killer, by Marketwire. 03/11/2009.

This statement explains why the announcement caught my attention: MaxxCat product developers cite “poor performance and intrinsic limitations of Google Mini and Google Search Appliance” as the impetus to develop the device. The enterprise search appliance, EX-5000, is over seven times faster than Google Search Appliance (GSA) and the small business search appliance, the XB-250, is 16 times faster than Google Mini. There is nothing like challenging the leading search appliance company with a statement like that to throw down the gauntlet. OK I’m watching and will be delighted to read or hear from early users.

Just one more take on “social search” as we learn about Aardvark: Answering the Tough Questions, David Hornik on VentureBlog. 03/12/2009

This week the Aardvark team is launching the fruits of that labor at South By Southwest (SXSW). They have built a “social search engine” that lives inside your IM and email. It allows you to ask questions of Aardvark, which then goes about determining who among your friends and friends of friends is most qualified to answer those questions. As the Aardvark team point out in their blog, Social Search is particularly well suited to answer subjective questions where “context” is important. I am not going to quibble now but I think I would have but this under my category of “semantic search” and natural language processing. Until we see it in action, who knows?

A new position at Attivio was announced on March 16th, Attivio Promotes John O’Neil to Chief Scientist, which tells me that they are still expanding at the end of their first official year in business.

Getting to the point, 03/18/2009, KMWorld.

Several announcements about Concept Searching’s release v. 4 of its flagship product, conceptClassifier for SharePoint highlight the fact that Microsoft’s acquisition of FAST has not slowed the number of enterprise search solution companies that continue to partner with or offer independent solutions for SharePoint. In this case the company offers its own standalone concept search solution applications for other content domains but is continuing to bank on lots of business from the SharePoint user community. This relationship is reflected in these statements: The company says features include a new installer that enables installation in a SharePoint environment in less than 20 minutes, requires no programmatic support and all functionality can be turned on or off using standard Microsoft SharePoint controls. Full integration with Microsoft Content Types and greater support for multiple taxonomies are also included in this release. Once the FAST search server becomes a staple for Microsoft SharePoint shops, there will undoubtedly be fallout for some of these partners.

Being invited to the Google Enterprise Search Summit in Cambridge, MA on March 19, 2009 was an opportunity for me to visit Google’s local offices and meet a bunch of customers.

They were a pretty enthusiastic crowd and are enjoying a lot of attention as this division of Google works to join the ranks of other enterprise application software companies. I suspect that it is a whole new venture for them to be entertaining customers in their offices in a “user-group like” forum but the Google speakers were energetic and clearly love the entrepreneurial aspects of being a newish run-away success within a run-away successful company. New customer announcements continue to flow from Google with SITA (The State Information Technology Agency in South Africa) acquiring GSA to drive an enterprise-wide research project. The solution will also be deployed and implemented by JSE-listed IT solutions and services company Faritec, and RR Donnelly. Several EMC users were represented at the meeting, which made me ask why they aren’t using the search tools being rolled out by the Documentum division…well, don’t ask.

Evans, Steve. Simplexo boosts public sector search options. Computer Business Review – UK. 03/18/2009.

This is interesting as an alternative to the Lucene/solr scene, UK-based open source enterprise search vendor Simplexo has launched a new search platform aimed at the public sector, which aims to enable central and local government departments to simultaneously search multiple disparate data sources across the organisation on demand. I have wondered when we would see some other open source offerings.

And all of the preceding is about just the startups (plus EMC at Google) and lesser known company activity. This was not a slow month. I don’t want all my contacts in the “established” search market to think that I am not paying attention because I am. I’ve exchanged communications with or been briefed by these known companies with news about new releases, advancing market share, or new executive teams. In no particular order these were the highlights of the month:

Endeca announced three new platforms on Mar 23, 2009: Endeca Announces the Endeca Publishing Suite, Giving Editors Unprecedented Control Over the Online Experience; Endeca Announces the Endeca Commerce Suite, Giving Retailers Continuous Targeted Merchandizing; and Endeca Unveils McKinley Release of the Information Access Platform, Allowing for Faster and Easier Deployment of Search Applications

Linguamatics Agile Text Mining Platform to Be Used by Novo Nordisk. 03/26/2009

I had a fine briefing by Coveo‘s CEO Laurent Simoneau and Michel Besmer new VP of Global Marketing and see them making great strides capturing market share across numerous verticals where rapid deployment and implementation are a big selling point. They also just announced: Bell Mobility and Coveo Partner to Create Enterprise Search from Bell, an Exclusive Enterprise-Grade Mobile Search Solution.

A new Version 7.6 of a mainstay, plug-and-play search solution for SMBs since 1991, dtSearch, was just released. 3/24/2009

And finally, ISYS is having a great growth path with a new technology release, ISYS File Readers, new executives and a new project … completed in conjunction with Steve Arnold, industry expert and author of the Beyond Search blog, compiled more than a decade of Google patent documents. To offer a more powerful method for analyzing and mining this content, we produced the Google Patent Search Demonstration Site, powered by our ISYS: web application.

Weatherwise, March, 2009 is out like a lamb but hot, hot, hot when it comes to search.

« Older posts

© 2021 The Gilbane Advisor

Theme by Anders NorenUp ↑