Curated content for content, computing, and digital experience professionsals

Tag: Taxonomies

Taxonomy and Glossaries for Enterprise Search Terminology

Two years ago when I began blogging for the Gilbane Group on enterprise search, the extent of my vision was reflected in the blog categories I defined and expected to populate with content over time. They represented my personal “top terms” that were expected to each have meaningful entries to educate and illuminate what readers might want to know about search behind the firewall of enterprises.

A recent examination of those early decisions showed me where there are gaps in content, perhaps reflecting that some of those topics were:

  • Not so important
  • Not currently in my thinking about the industry
  • OR Not well defined

I also know that on several occasions I couldn’t find a good category in my list for a blog I had just written. Being a former indexer and heavy user of controlled vocabularies, on most occasions I resisted the urge to create a new category and found instead the “best fit” for my entry. I know that when the corpus of content or domain is small, too many categories are useless for the reader. But now, as I approach 100 entries, it is time to reconsider where I want to go with blogging about enterprise search.

In the short term, I am going to try to provide entries for scantily covered topics because I still think they are all relevant. I’ll probably add a few more along the way or perhaps make some topics a little more granular.

Taxonomies are never static, and require periodic review, even when the amount of content is small. Taxonomists need to keep pace with current use of terminology and target audience interests. New jargon creeps in although I prefer to use generic and terms broadly understood in the technology and business world.

That gives you an idea of some of my own taxonomy process. To add to the entries on terminology (definitions) and taxonomies, I am posting a glossary I wrote for last year’s report on the enterprise search market and recently updated for the Gilbane Workshop on taxonomies. While the definitions were all crafted by me, they are validated through the heavy use of the Google “define” feature. If you aren’t already a user, you will find it highly useful when trying to pin down a definition. At the Google search box, simply type define: xxx xxx (where xxx represents a word or phrase for which you seek a definition). Google returns all the public definition entries it finds on the Internet. My definitions are then refined based on what I learn from a variety of sources I discover using this technique. It’s a great way to build your knowledge-base and discover new meanings.

Glossary Taxonomy and Search-012009

Ontologies and Semantic Search

Recent studies describe the negative effect of media including video, television and on-line content on attention spans and even comprehension. One such study suggests that the piling on of content accrued from multiple sources throughout our work and leisure hours has saturated us to the point of making us information filterers more than information “comprehenders”. Hold that thought while I present a second one.

Last week’s blog entry reflected on intellectual property (IP) and knowledge assets and the value of taxonomies as aids to organizing and finding these valued resources. The idea of making search engines better or more precise in finding relevant content is edging into our enterprises through semantic technologies. These are search tools that are better at finding concepts, synonymous terms, and similar or related topics when we execute a search. You’ll find an in depth discussion of some of these in the forthcoming publication, Beyond Search by Steve Arnold. However, semantic search requires more sophisticated concept maps than taxonomy. It requires ontology, rich representations of a web of concepts complete with all types of term relationships.

My first comment about a trend toward just browsing and filtering content for relevance to our work, and the second one about the idea of assembling semantically relevant content for better search precision are two sides of a business problem that hundreds of entrepreneurs are grappling with, semantic technologies.

Two weeks ago, I helped to moderate a meeting on the subject, entitled Semantic Web – Ripe for Commercialization? While the assumed audience was to be a broad business group of VCs, financiers, legal and business management professionals, it turned out to have a lot of technology types. They had some pretty heavy questions and comments about how search engines handle inference and its methods for extracting meaning from content. Semantic search engines need to understand both the query and the target content to retrieve contextually relevant content.

Keynote speakers and some of the panelists introduced the concept of ontologies as being an essential backbone to semantic search. From that came a lot of discussion about how and where these ontologies originate, how and who vets them for authoritativeness, and how their development in under-funded subject areas will occur. There were no clear answers.

Here I want to give a quick definition for ontology. It is a concept map of terminology which, when richly populated, reflects all the possible semantic relationships that might be inferred from different ways that terms are assembled in human language. A subject specific ontology is more easily understood in a graphical representation. Ontologies also help to inform semantic search engines by contributing to an automated deconstruction of a query (making sense out of what the searcher wants to know) and automated deconstruction of the content to be indexed and searched. Good semantic search, therefore, depends on excellent ontologies.

To see a very simple example of an ontology related to “roadway”, check out this image. Keep in mind that before you aspire to implementing a semantic search engine in your enterprise, you want to be sure that there is a trusted ontology somewhere in the mix of tools to help the search engine retrieve results relevant to your unique audience.

Enterprise Search: Leveraging and Learning from Web Search and Content Tools

Following on my last post in which I covered the unique value propositions offered by a variety of enterprise search products, this one takes a look at the evolution of enterprise search. The commentary by search company experts, executives, and analysts indicates some evolutionary technologies and the escalation of certain themes in enterprise search. Furthermore, the pursuit of organizations to strengthen the link between searching technologies and knowledge enablers has never been more prominently featured taking search to a whole new level beyond mere retrieval.

The following paraphrased comments from the Enterprise Search Keynote session are timely and revealing. When I asked, Will Web and Internet Search Technologies Drive the Enterprise (Internal) Search Tool Offerings or Will the Markets Diverge?, these were some thoughts from the panelists.

Matt Brown, Principal Analyst from Forrester Research, commented that enterprise search demands much different and richer content interpretation types of search technologies. What Web-based searching does is create such high visibility for search that enterprises are being primed to adopt it, but only when it comes with enhanced capabilities.

Echoing Matt’s remarks, Oracle search solution manager Bob Bocchino commented on the difficulty of making search operate well within the enterprise because it needs to deal with structured database content and unstructured files, while also applying sophisticated security features that let only authorized viewers see restricted content. Furthermore, security must be deployed in a way that does not degrade performance while supporting continuous updates to content and permissions.

Hadley Reynolds, VP & Director of the Center for Search Innovation at Fast Search & Transfer, noted that the Web isn’t really making a direct impact on enterprise search innovation but many of the social tools found on the Web are being adopted in enterprises to create new kinds of content (e.g. social networks, blogs and wikis) with which enterprise search engines must cope in richer contextual ways.

Don Dodge, Director of Business Development for the Emerging Business Team at Microsoft further noted that the Internet’s biggest problem is scale. That is a much easier problem to solve than in the enterprise where user standards for what qualifies as a good and valuable search results are much higher, therefore making the technology to deliver those results more difficult.

Among the other noteworthy comments in this session was a negative about taxonomies. The gist of it was that they require so much discipline that they might work for a while but can’t really be sustained. If this attitude becomes the norm, many of the semantic search engines which depend on some type of classification and categorization according to industry terminologies or locally maintained lists will be challenged to deliver enhanced search results. This is a subject to be taken up in a later blog entry.

A final conclusion about enterprise search was a remark about the evolution of adoption in the marketplace. Simply put, the marketplace is not monolithic in its requirements. The diversity of demands on search technologies has been a disincentive for vendors to focus on distinct niches and place more effort on areas like e-commerce. This seems to be shifting, especially with all the large software companies now seriously announcing products in the enterprise search market.

Taxonomies, Folksonomies & Controlled Vocabularies

There is an enlightening discussion going on between Lou Rosenfeld, Clay Shirky and others on the utility of folksonomies as used by Flickr and, vs. subject-matter-expert developed taxonomies. As one of the commenters has pointed out, this is not an “either/or” issue. Certain applications where the scope of the content and users is bounded will benefit from the discipline of a carefully architected vocabulary. Other applications where the scope of either the content or the user community is less well-defined will either suffer or, more likely, the users will ignore the prescriptions (this is why the “semantic web”, if I understand it at all, is hopeless). The key issues are related: cost and adoption (cost is usually a function of adoption, not development), and I think they both would agree on this point. How these approaches might work together is trickier and well worth exploring. In any case, this debate provides a condensed lesson in many issues that most enterprise content managers have probably not thought through, but even those that have should check out this thread.