This blog entry on the “Taxonomy Watch” website prompts me to correct the impression that I believe naysayers who say that taxonomies take too much time and effort to be valuable. Nothing could be further from the truth. I believe in and have always been highly vested in taxonomies because I am convinced that an investment in pre-processing enterprise generated content into meaningfully organized results brings large returns in time savings for a searcher. S/he, otherwise, needs to invest personally in the laborious post-processing activity of sifting and rejecting piles of non-relevant content. Consider that categorizing content well and only once brings benefit repeatedly to all who search an enterprise corpus.
Prime assets of enterprises are people and their knowledge; the resulting captured information can be leveraged as knowledge assets (KA). However, there is a serious problem “herding” KA into a form that results in leveragable knowledge. Bringing content into a focus that is meaningful to a diverse but specialized audience of users, even within a limited company domain is tough because the language of the content is so messy.
So, what does this have to do with taxonomies and enterprise search, and how they factor into leveraging KA? Taxonomies have a role as a device to promote and secure the meaningful retrievability of content when we need it most or fastest, just-in-time retrieval. If no taxonomies exist to pre-collocate and contextualize content for an audience, we will be perpetually stuck in a mode of having to do individual human filtering of excessive search results that come from “keyword” queries. If we don’t begin with taxonomies for helping search engines categorize content, we will certainly never get to the holy grail of semantic search. We need every device we can create and sustain to make information more findable and understandable; we just don’t have time to both filter and read, comprehensively, everything a keyword search throws our way to gain the knowledge we need to do our jobs.
Experts recognize that organizing content with pre-defined terminology (aka controlled vocabularies) that can be easily displayed in an expandable taxonomic structure is a useful aid for a certain type of searcher. The audience for navigated search is one that appreciates the clustering of search results into groups that are easily understood. They find value in being able to move easily from broad concepts to narrower ones. They especially like it when the categories and terminology are a close match to the way they view a domain of content in which they are subject experts. It shows respect for their subject area and gives them a level of trust that those maintaining the repository know what they need.
Taxonomies, when properly employed, serve triple duty. Exposing them to search engines that are capable of categorizing content puts them into play as training data. Setting them up within content management systems provides a control mechanism and validation table for human assigned metadata. Finally, when used in a navigated search environment, they provide a visual map of the content landscape.
U.S. businesses are woefully behind in “getting it;” they need to invest in search and surrounding infrastructure that supports search. Comments from a recent meeting I attended reflected the belief that the rest of the world is far ahead in this respect. As if to highlight this fact, a colleague just forwarded this news item yesterday. “On February 13, 2008, the XBRL-based financial listed company taxonomy formulated by the Shanghai Stock Exchange (SSE) was “Acknowledged” by the XBRL International. The acknowledgment information has been released on the official website of the XBRL International (http://www.xbrl.org/FRTaxonomies/)….”.
So, let’s get on with selling the basic business case for taxonomies in the enterprise to insure that the best of our knowledge assets will be truly findable when we need them.
Ontologies and Semantic Search
Ontologies also help to inform semantic search engines by contributing to an automated deconstruction of a query (making sense out of what the searcher wants to know) and automated deconstruction of the content to be indexed and searched. Good semantic…