<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Enterprise Search Practice Blog</title>
        <link>http://gilbane.com/search_blog/</link>
        <description>Analysis, opinion, and advice on enterprise search technologies and applications</description>
        <language>en</language>
        <copyright>Copyright 2009</copyright>
        <lastBuildDate>Tue, 03 Feb 2009 17:04:17 -0500</lastBuildDate>
        <generator>http://www.sixapart.com/movabletype/</generator>
        <docs>http://www.rssboard.org/rss-specification</docs>
        
        <item>
            <title>Native Database Search vs. Commercial Search Engines</title>
            <description><![CDATA[<p>This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.</p>

<p>The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.</p>

<p>To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today's content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records. </p>

<p>This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean "and" or "or" query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.</p>

<p>So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. <u>Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. </u>Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.</p>

<p>Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:<br />
<ul><br />
	<li><em>Full text</em> indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.</li><br />
	<li>Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.</li><br />
	<li>Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.</li><br />
	<li>Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.</li><br />
	<li>Social features that support interaction among users and personalization options must be built.</li><br />
	<li>Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.</li><br />
</ul><br />
Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.</p>]]></description>
            <link>http://gilbane.com/search_blog/2009/02/native_database_search_vs_comm.html</link>
            <guid>http://gilbane.com/search_blog/2009/02/native_database_search_vs_comm.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Search Technologies and Products</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Database engines</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">RDBMS</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">SQL</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search administration</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search technology</category>
            
            <pubDate>Tue, 03 Feb 2009 17:04:17 -0500</pubDate>
        </item>
        
        <item>
            <title>Churning in the Search Sector - Two BIG Events in One Week</title>
            <description><![CDATA[<p>Analysts having been projecting major consolidation in the enterprise search marketplace for a couple of years. What is interesting to me is how slowly this is evolving. For every merger or acquisition, whether small or large (acquisition of Mondosoft by SurfRay or FAST by Microsoft), other companies emerge or evolve with diverse and potentially competitive technologies (e.g. Attivio, Connotate, Expert System, EyeAlike, Truevert, Temis).</p>

<p>We have seen companies like Exalead, ISYS, and Vivisimo gain on former leaders. Microsoft is often listed as an industry leader because it acquired former leader FAST while companies with solid products for verticals, like Recommind in law and financial services, are often overlooked because they lack the total company revenues of a Microsoft that sells a lot more software than enterprise search.</p>

<p>This past week two industry news items caused me to reflect on the potential impact of announcements that, while not surprising, can upset the plans of buyers of search technology. The first was the announcement that Autonomy is planning to procure Interwoven. That Interwoven is being acquired is no surprise, since the company was being groomed for acquisition. However, this appears to be the first instance of a "search" company acquiring a "content management/document management" company. The norm has been that search companies get bought to fill a need by ECM or CMS vendors. For anyone planning to procure Interwoven because of its embedded Vivisimo Velocity for Universal search in its Worksite product, this does put a wrinkle in the fabric. What a shame because it is going to be a while before the actual impact is really known and could slow sales. The cost to buyers having to accept Autonomy's IDOL instead of Velocity could be significant. The effect could be on both licensing and deployment because Velocity has been an efficient install for most enterprises. Autonomy has got a big ramp up to shift from being a search company to becoming an ECM supplier and some will take a wait and see attitude, regardless of the Idol reputation.</p>

<p>The second big announcement, of course, is the departure from Microsoft of John Marcus Lervik, a co-founder of FAST and recently named Executive VP in a newly created position for Enterprise Search at Microsoft. I'm sure you'll be seeing plenty about the reasons elsewhere. However, the difficulty for those buyers who are depending on FAST's search technology to be integrated sooner rather than later in Microsoft's offerings has just been made more complicated as one of the original leaders of FAST is leaving the team.</p>

<p>Two years ago I commented to FAST executives about the need for vendors on a rapid growth path to make the buying, business and support experience for customers a priority, beyond technology enhancements; so, I take little consolation in seeing this turmoil. If you are a buyer, take a good hard look behind the technology to see what else you will be dealing with as you make plans to acquire software.</p>]]></description>
            <link>http://gilbane.com/search_blog/2009/01/churning_in_the_search_sector.html</link>
            <guid>http://gilbane.com/search_blog/2009/01/churning_in_the_search_sector.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Product Selection</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Autonomy</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Interwoven</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Microsoft</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search marketplace</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search product procurement</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Vendor selection</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Vivisimo</category>
            
            <pubDate>Mon, 26 Jan 2009 09:41:25 -0500</pubDate>
        </item>
        
        <item>
            <title>Taxonomy and Glossaries for Enterprise Search Terminology</title>
            <description><![CDATA[<p>Two years ago when I began blogging for the Gilbane Group on <em>enterprise search</em>, the extent of my vision was reflected in the blog <strong>categories</strong> I defined and expected to populate with content over time. They represented my personal "top terms" that were expected to each have meaningful entries to educate and illuminate what readers might want to know about search behind the firewall of enterprises.</p>

<p>A recent examination of those early decisions showed me where there are gaps in content, perhaps reflecting that some of those topics were:<br />
<ul><br />
	<li>Not so important</li><br />
	<li>Not currently in my thinking about the industry</li><br />
	<li>OR Not well defined</li><br />
</ul></p>

<p>I also know that on several occasions I couldn't find a good category in my list for a blog I had just written. Being a former indexer and heavy user of controlled vocabularies, on most occasions I resisted the urge to create a new category and found instead the "best fit" for my entry. I know that when the corpus of content or domain is small, too many categories are useless for the reader. But now, as I approach 100 entries, it is time to reconsider where I want to go with blogging about enterprise search.</p>

<p>In the short term, I am going to try to provide entries for scantily covered topics because I still think they are all relevant. I'll probably add a few more along the way or perhaps make some topics a little more granular.</p>

<p>Taxonomies are never static, and require periodic review, even when the amount of content is small. Taxonomists need to keep pace with current use of terminology and target audience interests. New jargon creeps in although I prefer to use generic and terms broadly understood in the technology and business world.</p>

<p>That gives you an idea of some of my own taxonomy process. To add to the entries on terminology (definitions) and taxonomies, I am posting a glossary I wrote for last year's report on the enterprise search market and recently updated for the Gilbane Workshop on taxonomies. While the definitions were all crafted by me, they are validated through the heavy use of the Google "define" feature. If you aren't already a user, you will find it highly useful when trying to pin down a definition. At the Google search box, simply type <em>define: xxx xxx</em> (where xxx represents a word or phrase for which you seek a definition). Google returns all the public definition entries it finds on the Internet. My definitions are then refined based on what I learn from a variety of sources I discover using this technique. It's a great way to build your knowledge-base and discover new meanings.</p>

<p><span class="mt-enclosure mt-enclosure-file" style="display: inline;"><a href="http://gilbane.com/search_blog/Glossary%20Taxonomy%20and%20Search-012009.pdf">Glossary Taxonomy and Search-012009.pdf</a></span></p>]]></description>
            <link>http://gilbane.com/search_blog/2009/01/glossary_of_taxonomy_and_searc.html</link>
            <guid>http://gilbane.com/search_blog/2009/01/glossary_of_taxonomy_and_searc.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Definitions</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Taxonomy/Thesaurus/Ontology</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search technology</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Taxonomies</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Taxonomy for search</category>
            
            <pubDate>Wed, 21 Jan 2009 13:07:16 -0500</pubDate>
        </item>
        
        <item>
            <title>Open Source Search &amp; Search Appliances Need Expert Attention</title>
            <description><![CDATA[<p>Search in the enterprise suffers from lack of expert attention to tuning, care and feeding, governance and fundamental understanding of what functionality comes with any one of the 100+ products now on the market. This is just as true for search appliances, and open source search tools (Lucene) and applications (Solr). But while companies licensing search out-of-the-box solutions or heavily customized search engines have service, support and upgrades built-in into their deliverables, the same level of support cannot be assumed for getting started with open source search or even appliances.</p>

<p>Search appliances are sold with licenses that imply some high level of performance without a lot of support, while open source search tools are downloadable for free. As speakers about both open source and appliances made perfectly clear at our recent Gilbane Conference, both come with requirements for human support. When any enterprise search product or tool is selected and procured, there is a presumed business case for acquisition. What acquirers need to understand above all else is the cost of ownership to achieve the expected value. This means people and people with expertise on an ongoing basis.</p>

<p>Particularly when budgets are tight and organizations lay off workers, we discover that those with specialized skills and expertise are often the first to go. The jack-of-all-trades, or those with competencies in maintaining ubiquitous applications are retained to be "plugged in" wherever needed. So, where does this leave you for support of the search appliance that was presumed to be 100% self-maintaining, or the open source code that still needs bug fixes, API development and interface design-work?</p>

<p>This is the time to look to system integrators and service companies with specialists in tools you use. They are immersed in the working innards of these products and will give you better support through service contracts, subscriptions or labor-based hourly or project charges than you would have received from your in-house generalists, anyway. </p>

<p>You may not see specialized system houses or service companies listed by financial publications as a growth business, but I am going to put my confidence in the industry to spawn a whole new category of search service organizations in the short term. Just-in-time development for you and lower overhead for your enterprise will be a growing swell in 2009. This is how outsourcing can really bring benefits to your organization.</p>

<p>Post-post note - Here is a related review on the state-of-open source in the enterprise: <em><a href="http://www.informationweek.com/news/software/open_source/showArticle.jhtml?articleID=212002355">The Open Source Enterprise; its time has come</a></a></em>, by Charles Babcock in <u>Information Week</u>, Nov. 17, 2008. Be sure to read the comments, too.</p>]]></description>
            <link>http://gilbane.com/search_blog/2009/01/open_source_search_search_appl.html</link>
            <guid>http://gilbane.com/search_blog/2009/01/open_source_search_search_appl.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Search Problems/Solved Search Problems</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Types of Search</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Open source search</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search administration</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search appliances</category>
            
            <pubDate>Thu, 15 Jan 2009 11:32:59 -0500</pubDate>
        </item>
        
        <item>
            <title>What Does an Analyst Do for You?</title>
            <description><![CDATA[<p>Among the roles that I have chosen for myself as Lead Analyst for Enterprise Search at the Gilbane Group is to evaluate, in broad strokes, the search marketplace for internal use at enterprises of all types. My principal audience is those within enterprises that may be involved in the selection, procurement, implementation and deployment of search technology to benefit their organizations. In this role, I am an advocate for buyers. However, when vendors pay attention to what I write it should help them understand the buyer's perspective. Ultimately, good vendors incorporate analyst guidance into their thinking about how to serve their customer better.</p>

<p>We do not hide the fact that, as industry analysts, we also consult to various content software companies. When doing so, I try to keep in mind that the market will be served best when I honestly advocate for software and service improvements that will benefit buyers. This is a value to those who sell and those who buy software. My consulting to vendors indirectly benefits both audiences.</p>

<p>Analysts also consult to buyers, to help them make informed decisions about technology decisions and business relationships. I particularly enjoy and value those experiences because what I learn about enterprise buyers' needs and expectations can translate directly into advice to vendors. This is an honest brokering role that comes naturally because I have been a software vendor and also in a position to make many software procurement decisions, particularly tools and applications that were used by my development and service teams. I'm always enthusiastic to be in a position to share important information about products with buyers and information about buying audiences with those who build products. This can be done effectively while preserving confidentiality on both sides and making sure that everyone gets something out of the communications.</p>

<p>As an analyst, I receive a lot of requests by vendors to listen to, by phone and Web, briefings on their products, or to meet, one-on-one with their executives. You may have noticed that I don't write reviews of specific products although, in a particular context, I may reference products and applications. While we understand the reason that product vendors want analysts to pay attention to them, I don't find briefings particularly enlightening unless I know nothing about a company and its offerings. For these types of overviews, I can usually find what I want to know on their Web site, in press releases and by poking around the Web. During briefings I want to drive the conversation toward user experiences and needs.</p>

<p>What I do like to do is talk to product users about their experiences with a vendor or a product. I like to know what the implementation and adoption experience is like and how their organization had been affected by product use, both benefits and drawbacks. It is not always easy to gain access to customers but I have ways of finding them and also encourage readers of this blog to reach out with your stories. I am delighted to learn more through comments to the blog, an email or phone call. If you are willing to chat with me for a while, I will call you at your convenience.</p>

<p>The original topic I planned to write about this week will have to wait because, after receiving over 20 invitations to "be briefed" in the past few days, I decided it was more important to let readers know who I want to be briefed by - search technology users are my number one target. Vendors please push your customers in this direction if you want me to pay attention. This can bring you a lot of value, too. It is a matter of trust.</p>]]></description>
            <link>http://gilbane.com/search_blog/2009/01/what_does_an_analyst_do_for_yo.html</link>
            <guid>http://gilbane.com/search_blog/2009/01/what_does_an_analyst_do_for_yo.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Management</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Analysts</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Enterprise search industry</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Research</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Vendor relations</category>
            
            <pubDate>Tue, 06 Jan 2009 09:58:24 -0500</pubDate>
        </item>
        
        <item>
            <title>Enterprise Search 2008 Wrap-Up</title>
            <description><![CDATA[<p>It would be presumptuous to think that I could adequately summarize a very active year of evolution among a huge inventory of search technologies. This entry is more about what I have learned and what I opine about the state-of-the-market, than an analytical study and forecast. </p>

<p><u>The weak link in the search market is product selection methods</u>. My first thought is that we are in a state of technological riches without clear guideposts for which search models work best in any given enterprise. Those tasked to select and purchase products are not well-educated about the marketplace but are usually not given budget or latitude to purchase expert analysis when it is available. It is a sad commentary to view how organizations grant travel budgets to attend conferences where only limited information can be gathered about products but will not spend a few hundred dollars on in-depth comparative expert analyses of a large array of products.</p>

<p>My sources for this observation are numerous, confirmed by speakers in our Gilbane conference search track sessions in <a href="http://gilbaneboston.com/conference_descriptions.html">Boston</a> and <a href="http://gilbanesf.com/08/conference-grid.html">San Francisco</a>. As they related their personal case histories for selecting products, speakers shared no tales of actually doing literature searches or in-depth research using resources with a cost associated. This underscores another observation, those procuring search do not know how to search and operate in the belief that they can find "good enough" information using only "free stuff." Even their review of material gathered is limited to skimming rather than a systematic reading for concrete facts. This does not make for well-reasoned selections. As noted in an earlier entry, a widely published chart stating that product X is a leader does nothing to enlighten your enterprise's search for search. In one case, product leadership is determined primarily by the total software sales for the "leader" of which search is a miniscule portion.</p>

<p>Don't expect satisfaction with search products to rise until buyers develop smarter methods for selection and better criteria for making a buy decision that suits a particular business need.</p>

<p><u>Random Thoughts</u>. It will be a very long time before we see a universally useful, generic search function embedded in Microsoft (MS) product suites as a result of the FAST acquisition. Asked earlier in the year by a major news organization whether I though MS had paid too much for FAST, I responded "no" if what they wanted was market recognition but "yes" if they thought they were getting state-of-the-art-technology. My position holds; the financial and legal mess in Norway only complicates the road to meshing search technology from FAST with Microsoft customer needs.</p>

<p>I've wondered what has happened to the OmniFind suite of search offerings from IBM. One source tells me it makes IBM money because none of the various search products in the line-up are standalone, nor do they provide an easy transition path from one level of product to another for upward scaling and enhancements. IBM can embed any search product with any bundled platform of other options and charge for lots of services to bring it on-line with heavy customization.</p>

<p>Three platform vendors seem to be penetrating the market slowly but steadily by offering more cohesive solutions to retrieval. Native search solutions are bundled with complete content capture, publishing and search suites, purposed for various vertical and horizontal applications. These are Oracle, EMC, and OpenText. None of these are out-of-the-box offerings and their approach tends to appeal to larger organizations with staff for administration. At least they recognize the scope and scale of enterprise content and search demands, and customer needs.</p>

<p><u>On <a href="http://gilbane.com/search_blog/2008/11/case_studies_and_guidance_for.html">User Presentations</a> at the Boston Gilbane Conference</u>, I was very pleased with all sessions, the work and thought the speakers put into their talks. There were some noteworthy comments in those on Semantic Search and Text Technologies, Open Source and Search Appliances. </p>

<p>On the topic of <strong>semantic (contextual query and retrieval) search</strong>, text mining and analytics, the speakers covered the range of complexities in text retrieval, leaving the audience with a better understanding of how diverse this domain has become. Different software application solutions need to be employed based on point business problems to be solved. This will not change, and enterprises will need to discriminate about which aspects of their businesses need some form of semantically enabled retrieval and then match expectations to offerings. Large organizations will procure a number of solutions, all worthy and useful. Jeff Catlin of Lexalytics gave a clear set of definitions within this discipline, industry analyst Curt Monash provoked us with where to set expectations for various applications, and Win Carus of Information Extraction Systems illustrated the tasks extraction tools can perform to find meaning in a heap of content. The story has yet to be written on how semantic search is and will impact our use of information within organizations.</p>

<p>Leslie Owens of Forrester and Sid Probstein of Attivio helped to ground the discussion of <strong>when and why open source software is appropriate</strong>. The major take-way for me was an understanding of the type of organization that benefits most as a contributor and user of open source software. Simply put, you need to be heavily vested and engaged on the technical side to get out of open source what you need, to mold it to your purpose. If you do not have the developers to tackle coding, or the desire to share in a community of development, your enterprise's expectations will not be met and disappointment is sure to follow.</p>

<p>Finally, several lively discussions about <strong>search appliance adoption and application</strong> (Google Search Appliance and Thunderstone) strengthen my case for doing homework and making expenditures on careful evaluations before jumping into procurement. While all the speakers seem to be making positive headway with their selected solutions, the path to success has involved more diversions and changes of course than necessary for some because the vetting and selecting process was too "quick and dirty" or dependent on too few information sources. This was revealed: true <em>plug and play i</em>s an appliance myth.</p>

<p><u>What will 2009 bring?</u> I'm looking forward to seeing more applications of products that interest me from companies that have impressed me with thoughtful and realistic approaches to their customers and target audiences. Here is an uncommon clustering of search products.</p>

<p><strong>Multi-repository search</strong> across database applications, content collaboration stores document management systems and file shares: Coveo, Autonomy, Dieselpoint, dtSearch, Endeca, Exalead, Funnelback, Intellisearch, ISYS, Oracle, Polyspot, Recommind, Thunderstone, Vivisimo, and X1. In this list is something for every type of enterprise and budget.</p>

<p>Business and analytics focused software with <strong>intelligence gathering search</strong>: Attensity, Attivio, Basis Technology, ChartSearch, Lexalytics, SAS, and Temis.</p>

<p>Comprehensive solutions for capture, storage, metadata management and search for <strong>high quality management of content for targeted audiences</strong>: Access Innovations, Cuadra Associates, Inmagic, InQuira, Knova, Nstein, OpenText, ZyLAB. </p>

<p>Search engines with <strong>advanced semantic processing or natural language processing</strong> for high quality, contextually relevant retrieval when quantity of content makes human metadata indexing prohibitive: Cognition Technologies, Connotate, Expert System, Linguamatics, Semantra, and Sinequa</p>

<p><strong>Content Classifier, thesaurus management, metadata server products </strong>have interplay with other search engines and a few have impressed me with their vision and thoughtful approach to the technologies: MarkLogic, MultiTes, Nstein, Schemalogic, Seaglex, and Siderean.</p>

<p>Search with a principal<strong> focus on SharePoint repositories</strong>: BA-Insight, Interse, Kroll Ontrack, and SurfRay.</p>

<p>Finally, some <strong>unique search applications</strong> are making serious inroads. These include Documill for <strong>visual and image</strong>, Eyealike for <strong>image and people</strong>, Krugle for <strong>source code</strong>, and Paglo for <strong>IT infrastructure </strong>search.</p>

<p>This is the list of companies that interest me because I think they are on track to provide good value and technology, many still small but with promise. As always, the proof will be in how they grow and how well they treat their customers.</p>

<p>That's it for a wrap on <em>Year 2 </em>of the Enterprise Search Practice at the Gilbane Group. Check out our search studies at <a href="http://gilbane.com/Research-Reports.html">http://gilbane.com/Research-Reports.html</a> and PLEASE let me hear your thoughts on my thoughts or any other search related topic via the contact information at <a href="http://gilbane.com/contact.html">http://gilbane.com/contact.html</a></p>]]></description>
            <link>http://gilbane.com/search_blog/2008/12/enterprise_search_2008_wrap-up.html</link>
            <guid>http://gilbane.com/search_blog/2008/12/enterprise_search_2008_wrap-up.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Search Technologies and Products</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Conferences</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Enterprise search industry</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search marketplace</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search product procurement</category>
            
            <pubDate>Wed, 31 Dec 2008 13:57:26 -0500</pubDate>
        </item>
        
        <item>
            <title>Case Studies and Guidance for Search Implementations</title>
            <description><![CDATA[<p>We'll be covering a chunk of the search landscape at the Gilbane Conference next week. While there are nominally over 100 search solutions that target some aspect of enterprise search, there will be plenty to learn from the dozen or so case studies and tool options described. Commentary and examples include: Attivio, Coveo, Exalead, Google Search Appliance (GSA), IntelliSearch, Lexalytics, Lucene, Oracle Secure Enterprise Search, Thunderstone and references to others. Our speakers will cue us into the current state of the search as it is being implemented. Several exhibitors are also on site to demonstrate their capabilities and they represent some of the best. Check out the program lineup below and try to make it to Boston to hear those with hands-on experience.</p>

<p><u>EST-1: Plug-and Play: Enterprise Experiences with Search Appliances</u><br />
<ul><br />
	<li><em>So you want to implement an enterprise search solution?</em> Speaker: Angela A. Foster, FedEx Services, FedEx.com Development, and Dennis Shirokov, Marketing Manager, FedEx Digital Access Marketing. </li><br />
	<li><em>The Make or Buy Decision at the U.S. General Services Admin.</em> Speaker: Thomas Schaefer, Systems Analyst and Consultant, U.S. General Services Administration</li><br />
	<li>Process and Architecture for Implementing GSA at MITRE. Robert Joachim, Info Systems Engr, Lead, The MITRE Corporation. </li></p>

</ul>

<p><u>EST-2: Search in the Enterprise When SharePoint is in the Mix</u><br />
<ul><br />
	<li><em>Enterprise Report Management: Bringing High Value Content into the Flow of Business Action.</em> Speaker: Ajay Kapur, VP of Product Development, Apps Associates</li><br />
	<li><em>Content Supply? Meet Knowledge Demand: Coveo SharePoint integration. </em>Speaker: Marc Solomon, Knowledge Planner, PRTM. </li><br />
	<li><em>In Search of the Perfect Search: Google Search on the Intranet.</em> Speaker: June Nugent, Director of Corporate Knowledge Resources, NetScout Systems, </li><br />
</ul></p>

<p><u>EST-3: Open Source Search Applied in the Enterprise</u><br />
<ul><br />
	<li>Context for Open Source Implementations. Speaker: Leslie Owen, Analyst, Forrester Research</li><br />
	<li>Intelligent Integration: Combining Search and BI Capabilities for Unified Information Access. Speaker: Sid Probstien, CTO, Attivio</li><br />
</ul></p>

<p><u>EST-4: Search Systems: Care and Feeding for Optimal Results</u><br />
<ul><br />
	<li><em>Getting Off to a Strong Start with Your Search Taxonomy</em>. Speaker: Heather Hedden, Principal Hedden Information Management</li><br />
	<li><em>Getting the Puzzle Pieces to Fit; Finding the Right Search Solution(s)</em> Patricia Eagan, Sr. Mgr, Web Communications, The Jackson Laboratory. </li><br />
	<li>How Organizations Need to Think About Search. Speaker: Rob Wiesenberg, President & Founder, Contegra Systems</li><br />
</ul></p>

<p><u>EST-5: Text Analytics/Semantic Search: Parsing the Language</u> <br />
<ul><br />
	<li>Overview and Differentiators: Text Analytics, Text Mining and Semantic Technologies. Jeff Catlin, CEO, Lexalytics</li><br />
	<li>Reality and Hype in the Text Retrieval Market. Curt Monash, President, Monash Research.</li><br />
	<li>Two Linguistic Approaches to Search: Natural Language Processing and Concept Extraction. Speaker: Win Carus, President and Founder, Information Extraction Systems</li><br />
</ul></p>

<p><u>Exhibitors with a Search Focus:</u><br />
<a href="http://www.attivio.com">Attivio Active Intelligence</a><br />
 <a href="http://www.coveo.com">Coveo G2B</a><br />
<a href="http://www.exalead.com">Exalead Cloud View</a><br />
<a href="http://www.intellisearch.no/">IntelliSearch</a><br />
<a href="http://www.lexalytics.com">Lexalytics and infonic Company</a> <br />
</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/11/case_studies_and_guidance_for.html</link>
            <guid>http://gilbane.com/search_blog/2008/11/case_studies_and_guidance_for.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Case Studies</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Enterprise search</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search case studies</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search technology</category>
            
            <pubDate>Tue, 25 Nov 2008 18:09:24 -0500</pubDate>
        </item>
        
        <item>
            <title>Enterprise Search is Everywhere</title>
            <description><![CDATA[<p>When you look for an e-mail you sent last week, a vendor account rep's phone number, a PowerPoint presentation you received from a colleague in the Paris office, a URL to an article recommended for reading before the next Board meeting, or background on a company project you have been asked to manage, you are engaged in search <em>in</em>, <em>about</em>, or <em>for </em>your enterprise. Whether you are working inside applications that you have used for years, or simply perusing the links on a decade's old corporate intranet, trying to find something when you are in the enterprise doing its work, you are engaging with a search interface.</p>

<p>Dissatisfaction comes from the numbers of these interfaces and the lack of cohesive roadmap to <u>all</u> there is to be found. You already know what you know and what you need to know. Sometimes you know how to find what you need to know but more often you don't know and stumble through a variety of possibilities up to and including asking someone else how to find it. That missing roadmap is more than an annoyance; it is a major encumbrance to doing your job and top management does not get it. They simply won't accept that one or two content roadmap experts (overhead) could be saving many people-years of company time and lost productivity.</p>

<p>In most cases, the simple notion of creating clear guidelines and signposts to enterprise content is a funding showstopper. It takes human intelligence to design and build that roadmap and put the technology aids in place to reveal it. Management will fund technology but not the content architects, knowledge "mappers" and ongoing gatekeepers to stay on top of organizational change, expansions, contractions, mergers, rule changes and program activities that evolve and shift perpetually. They don't want infrastructure overhead whose primary focus, day-in and day-out, will be observing, monitoring, communicating, and thinking about how to serve up the information that other workers need to do their jobs. These people need to be in place as the "black-boxes" that keep search tools in tip-top operating form.</p>

<p>Last week I <a href="http://gilbane.com/search_blog/2008/11/in_the_field_the_enterprise_se_1.html">commented on the products</a> that will be featured in the Search Track at <a href="http://gilbane.com/search_blog/2008/11/in_the_field_the_enterprise_se_1.html">Gilbane Boston</a>, Dec. 3rd and 4th. What you will learn about these tools is going to be couched in case studies that reveal the ways in which search technology is leveraged by people who think a lot about what needs to be found and how search needs to work in their enterprises. They will talk about what tools they use, why and what they are doing to get search to do its job. I've asked the speakers to tell their stories and based on my conversations with them in the past week, that is what we will hear, the reality!<br />
</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/11/enterprise_search_is_everywher.html</link>
            <guid>http://gilbane.com/search_blog/2008/11/enterprise_search_is_everywher.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Case Studies</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Expertise management</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Governance</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Intranets</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Knowledge management</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search infrastructure</category>
            
            <pubDate>Tue, 18 Nov 2008 16:31:43 -0500</pubDate>
        </item>
        
        <item>
            <title>In the Field: The Enterprise Search Market Offers CHOICES</title>
            <description><![CDATA[<p>Heading into the <a href="http://gilbaneboston.com/index.html">Gilbane Boston conference</a> next month we have case studies that feature quite an array of enterprise search applications. So many of the search solutions now being deployed are implemented with a small or part-time staff that it is difficult to find the one or two people who can attend a conference to tell their stories. We have surveyed blogs, articles and case studies published elsewhere to identify organizations and people who have hands-on-experience in the trenches deploying search engines in their enterprises. Our speakers are those who were pleased to be invited and they will be sharing their experiences on December 3rd and 4th.</p>

<p>From search appliances <a href="http://www.thunderstone.com/texis/site/pages">Thunderstone</a> and <a href="http://www.google.com/enterprise/gsa/">Google Search Appliance</a>, to platform search solutions based on <a href="http://www.oracle.com/technology/products/oses/index.html">Oracle Secure Enterprise Search</a>, and standalone search products <a href="http://www.coveo.com/en/default.aspx/">Coveo</a>, <a href="http://www.exalead.com/software/">Exalead</a>, and <a href="http://www.isys-search.com/">ISYS</a>, we will hear from those who have been involved in selecting, implementing and deploying these solutions for enterprise use. From a <a href="http://www.forrester.com/rb/ikm">Forrester</a> industry analyst and <a href="http://www.attivio.com/">Attivio</a> developer we'll hear about open source options and how they are influencing enterprise search development. The search sessions will be rounded out as we explore the influences and mergers of text mining, text analytics with <a href="http://www.monash.com/">Monash Research</a> and semantic technologies (<a href="http://lexalytics.com/">Lexalytics</a> and <a href="http://infoextract.com/">InfoExtract</a>) as they relate to other enterprise search options. There will be something for everyone in the sessions and in the <a href="http://gilbaneboston.com/exhibitors_sponsors.html">exhibit hall</a>.</p>

<p>Personally, I am hoping to see many in the audience who also have search stories within their own enterprises. Those who know me will attest to my strong belief in communities of practice and sharing. It strengthens the marketplace place when people from different types of organizations share their experiences trying to solve similar problems with different products. Revealing competitive differentiators among the numerous search products is something that pushes technology envelopes and makes for a more robust marketplace. Encouraging dialogue about products and in-the-field experiences is a priority for all sessions at the <a href="http://gilbaneboston.com/conference-schedule.html">Gilbane Conference</a> and I'll be there to prompt discussion for all five search sessions. I hope you'll join me in Boston.</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/11/in_the_field_the_enterprise_se_1.html</link>
            <guid>http://gilbane.com/search_blog/2008/11/in_the_field_the_enterprise_se_1.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Case Studies</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Conferences</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Product evaluation</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search case studies</category>
            
            <pubDate>Thu, 13 Nov 2008 17:36:10 -0500</pubDate>
        </item>
        
        <item>
            <title>Apples and Orangutans: Enterprise Search and Knowledge Management</title>
            <description><![CDATA[<p>This title by Mike Altendorf, in <u>CIO Magazine</u>, October 31, 2008, mystifies me, <a href="http://www.cio.co.uk/concern/change/expertadvice/index.cfm?articleid=716">Search Will Outshine KM</a>. I did a little poking around to discover who he is and found a <a href="http://www.cio.co.uk/concern/change/expertadvice/index.cfm?articleid=716">similar statement</a> by him back in September, <em>Search is being implemented in enterprises as the new knowledge management and what's coming down the line is the ability to mine the huge amount of untapped structured and unstructured data in the organisation</em>. </p>

<p>Because I follow enterprise search for the Gilbane Group while maintaining a separate consulting practice in knowledge management, I am struggling with his conflation of the two terms or even the migration of one to the other. The <em>search</em> we talk about is a set of software technologies that retrieve content. I'm tired of the debate about the terminology "enterprise search" vs. "behind the firewall search." I tell vendors and buyers that my focus is on software products supporting search executed within (or from outside looking in) the enterprise on content that originates from within the enterprise or that is collected by the enterprise. I don't judge whether the product is for an exclusive domain, content type or audience, or whether it is deployed with the "intent" of finding and retrieving every last scrap of content lying around the enterprise. It never does nor will do the latter but if that is what an enterprise aspires to, theirs is a judgment call I might help them re-evaluate in consultation. </p>

<p>It is pretty clear that Mr. Altendorf is impressed with the potential for Fast and Microsoft so he knows they are firmly entrenched in the software business. But knowledge management (KM) is not now, nor has it ever been, a software product or even a suite of products. I will acknowledge that KM is a messy thing to talk about and the label means many things even to those of us who focus on it as a practice area. It clearly got derailed as a useful "discipline" of focus in the 90s when tool vendors decided to place their products into a new category called "knowledge management."</p>

<p>It sounded so promising and useful, this idea of KM software that could just suck the brains out of experts and the business know-how of enterprises out of hidden and lurking content. We know better, we who try to refine the art of leveraging knowledge by assisting our clients with blending people and technology to establish workable business practices around knowledge assets. We bring together IT, business managers, librarians, content managers, taxonomists, archivists, and records managers to facilitate good communication among many types of stakeholders. We work to define how to apply behavioral business practices and tools to business problems. Understanding how a software product is helpful in processes, its potential applications, or to encourage usability standards are part of the knowledge manager's toolkit. It is quite an art, the KM <strong>process</strong> of bringing tools together with knowledge assets (people and content) into a productive balance. </p>

<p>Search is one of the tools that can facilitate leveraging knowledge assets and help us find the experts who might share some "how-to" knowledge, but it is not, nor will it ever be a substitute for KM. You can check out these links to see how others line up on the definitions of KM: <a href="http://www.cio.com/article/40343/ABC_An_Introduction_to_Knowledge_Management_KM_">CIO introduction to KM</a> and <a href="http://en.wikipedia.org/wiki/Knowledge_management#cite_note-0">Wikipedia</a>. Let's not have the "KM is dead" discussion again!</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/11/apples_and_orangutans_enterpri.html</link>
            <guid>http://gilbane.com/search_blog/2008/11/apples_and_orangutans_enterpri.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Definitions</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Enterprise search</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Knowledge management</category>
            
            <pubDate>Mon, 03 Nov 2008 17:31:17 -0500</pubDate>
        </item>
        
        <item>
            <title>When We Are Missing Good Metadata in Enterprise Search</title>
            <description><![CDATA[<p>This blog has not focused on non-profit institutions (e.g. museums, historical societies) as enterprises but they are repositories of an extraordinary wealth of information. The past few weeks I've been trying, with mixed results, to get a feel for the accessibility of this content through the public Web sites of these organizations. My queries leave me with a keen sense of why search on company intranets also fail.</p>

<p>Most sizable non-profits want their collections of content and other information assets exposed to the public. But each department manages its own content collections with software that is unique to their specific professional methods and practices. In the corporate world the mix will include human resources (HR), enterprise resource management (ERP) systems, customer relationship management (CRM), R & D document management systems and collaboration tools. Many corporations have or "had" library systems that reflected a mix of internally published reports and scholarly collections that support R & D and special areas such as competitive intelligence. Corporations struggle constantly with federating all this content in a single search system.</p>

<p>Non-profit organizations have similar disparate systems constructed for their special domain, museums or research institutions. One area that is similar between the corporate and non-profit sector is libraries, operating with software whose interfaces hearken back to designs of the late 1980s or 90s. Another by-product of that era was the catalog record in a format devised by the Library of Congress for the electronic exchange of records between library systems. It was never intended to be the format for retrieval. It is similar to the metadata in content management systems but is an order of magnitude more complex and arcane to the typical person doing searching. Only librarians and scholars really understand the most effective ways to search most library systems; therein lies the "public access" problem. In a corporation a librarian often does the searching.</p>

<p>However, a visitor to a museum Web site would expect to quickly find a topic for which the museum has exhibit materials, printed literature and other media, all together. This calls for nomenclature that is "public friendly" and reflects the basic "aboutness" of all the materials in museum departments and collections. It is a problem when each library and curatorial department uses a different method of categorizing. Libraries typically use Library of Congress Subject Headings. What makes this problematic is that topics are so numerous. The number of possible subject headings is designed for the entire population of all Library of Congress holdings, not a special collection of a few tens of thousands of materials. Almost no library systems search for words "contained in" the subject headings if you try to browse just the Subject index. If I am searching Subjects for all <em>power generation</em> materials and a heading such as <em>electric power generation</em> is used, it will not be found because the look-up mechanism only looks for headings that "begin with" <em>power generation</em>.</p>

<p>Let's cut to the chase; mountains of metadata in the form of library cataloging are locked inside library systems within non-profit institutions. It is not being searched at the search box when you go to a museum Web site because it is not accessible to most "enterprise" or "web site" search engines. Therefore, a separate search must be done in the library system using a more complex approach to be truly thorough. </p>

<p>We have a big problem if we are to somehow elevate library collections to the same level of importance as the rest of a museum's collections and integrate the two. Bigger still is the challenge of getting everything indexed with a normalized vocabulary for the comfort of all audiences. This is something that takes thought and coordination among professionals of diverse competencies. It will not be solved easily but it must be done for institutions to thrive and satisfy all their constituents. Here we have yet another example of where enterprise search will fail to satisfy, not because the search engine is broken but because the underlying data is inappropriately packaged for indexes to work as expected. Yet again, we come to the realization that we need people to recognize and fix the problem.<br />
</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/10/when_we_are_missing_metadata_i.html</link>
            <guid>http://gilbane.com/search_blog/2008/10/when_we_are_missing_metadata_i.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Search Problems/Solved Search Problems</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Integrating technologies</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Keyword searching</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Metadata</category>
            
            <pubDate>Thu, 23 Oct 2008 20:43:44 -0500</pubDate>
        </item>
        
        <item>
            <title>What Determines a Leader in the Enterprise Search Market?</title>
            <description><![CDATA[<p>Let's agree that most if not all "enterprise search" is really about point solutions within large corporations. As I have written elsewhere, the "enterprise" is almost always a federation of constituencies, each with their own solutions for content applications and that includes search. If there is any place that we find truly enterprise-wide application of search, it is in small and medium organizations (SMBs). This would include professional service firms (consultancies and law firms), NGOs, many non-profits, and young R&D companies. There are plenty of niche solutions for SMBs and they are growing.</p>

<p>I bring this up because the latest <a href="http://mediaproducts.gartner.com/reprints/microsoft/vol6/article4/article4.html">Gartner "magic quadrant" </a>lists Microsoft (MS) as the "leader" in enterprise search; this is the same place Gartner has positioned Fast Search & Transfer in the past. Whether this is because Fast's assets are now owned by MS or because Gartner really believes that Microsoft is the leader, I still beg to strongly differ.</p>

<p>I have been perplexed by the Microsoft/Fast deal since it was announced earlier this year because, although Fast has always offered a lot of search technology, I never found it to be a compelling solutions for any of my clients. Putting aside the huge upfront capital cost for licenses, the staggering amount of development work, and time to deployment there were other concerns. I sensed a questionable commitment to an on-going, sustainable, unified and consistent product vision with supporting services. I felt that any client of mine would need very deep pockets indeed to really make a solid value case for Fast. Most of my clients are already burned out on really big enterprise deployments of applications in the ERP and CRM space, and understand the wisdom of beginning with smaller value-achievable, short-term projects on which they can build. </p>

<p>Products that impress me as having much more "out-of-the-box" at a more reasonable cost are clearly leaders in their unique domains. They have important clients achieving a good deal of benefit at a reasonable cost, in a short period of time. They have products that can be installed, implemented and maintained internally without a large staff of administrators, and they have good reputations among their clients for responsiveness and a cohesive series of roll-outs. Several have as many or more clients than Fast ever had (if we ever know the real number). Coveo, Exalead, ISYS, Recommind, Vivisimo, and X1 are a few of a select group that are marking a mark in their respective niches, as products ready for action with a short implementation cycle (weeks or months not years).</p>

<p>Autonomy and Endeca continue to bring value to very large projects in large companies but are not plug-and-play solutions, by any means. Oracle, IBM, and Microsoft offer search solutions of a very different type with a heavy vendor or third-party service requirement. Google Search Appliance has a much larger installed base than any of these but needs serious tuning and customization to make it suitable to enterprise needs. Take the "leadership" designation with a big grain of salt because what <em>leads </em>on the charts may be exactly what bogs you down. There are no generic, one-suit-fits-all enterprise search solutions including those in the "leaders" quadrant.<br />
</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/10/what_determines_a_leader_in_th.html</link>
            <guid>http://gilbane.com/search_blog/2008/10/what_determines_a_leader_in_th.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Product Selection</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Enterprise search industry</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Fast</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Microsoft</category>
            
            <pubDate>Thu, 09 Oct 2008 18:51:20 -0500</pubDate>
        </item>
        
        <item>
            <title>Dewey Decimal Classification, Categorization, and NLP</title>
            <description><![CDATA[<p>I am surprised how often various content organizing mechanisms on the Web are compared to the <em>Dewey Decimal System</em>. As a former librarian, I am disheartened to be reminded how often students were lectured on the Dewey Decimal system, apparently to the exclusion of learning about <em>subject categorization </em>schemes. They complemented each other but that seems to be a secret among all but librarians. </p>

<p>I'll try to share a clearer view of the model and explain why new systems of organizing content in enterprise search are quite different than the decimal model.</p>

<p>Classification is a good generic term for defining physical organizing systems. Unique animals and plants are distinguished by a <strong>single</strong> classification in the biological naming system. So too are books in a library. There are two principal classification systems for arranging books on the shelf in Western libraries: Dewey Decimal and Library of Congress (LC). They each use coding (numeric for Dewey decimal and alpha-numeric for Library of Congress) to establish where a book belongs logically on a shelf, relative to other books in the collection, according to the book's most prominent content topic. A book on <em>nutrition for better health</em> might be given a classification number for some aspect of <em>nutrition</em> or one for a <em>health topic</em>, but a human being has to make a judgment which topic the book is most "about" because the book can only live in one section of the collection. It is probably worth mentioning that the Dewey and LC systems are both hierarchical but with different priorities. (e.g. Dewey puts broad topics like <em>Religion</em> and <em>Philosophy and Psychology</em> at top levels and LC puts those two topics together while including more scientific and technical topics at the top of the list, like <em>Agriculture</em> and <em>Military Science</em>.)</p>

<p>So why classify books to reside in topic order? It requires a lot of labor to move the collections around to make space for new books. It is for the benefit of the users, to enable "browsing" through the collection, although it may be hard to accept that the term <em>browsing</em> was a staple of library science decades before the internet. Library leaders established eons ago the need for a system of physical organization to help readers peruse the book collection by topic, leading from the general to the specific.</p>

<p>You might ask what kind of help that was for finding the book on nutrition that was classified under "health science." This is where another system, largely hidden from the public or often made annoyingly inaccessible, comes in. It is a system of categorization in which any content, book or otherwise, can be assigned an unlimited number of categories. Wondering through the stacks, one would never suspect this secret way of finding a nugget in a book about your favorite hobby if that book was classified to live elsewhere. The standard lists of terms for further describing books by multiple headings are called "subject headings" and you had to use a library catalog to find them. Unfortunately, they contain mysterious conventions called "sub-divisions," designed to pre-coordinate any topic with other generic topics (e.g. Handbooks, etc. and United States). Today we would call these generic subdivision terms, <em>facets</em>. One reflects a kind of book and the other reveals a geographical scope covered by the book.</p>

<p>With the marvel of the Web page, hyperlinking, and "clicking through" hierarchical lists of topics we can click a mouse to narrow a search for <em>handbooks</em> on <em>nutrition</em> in the <em>United States</em> for better <em>health</em> beginning at any facet or topic and still come up with the book that meets all four criteria. We no longer have to be constrained by the Dewey model of browsing the physical location of our favorite topics, probably missing a lot of good stuff. But then we never did. The subject card catalog gave us a tool for finding more than we would by classification code alone. But even that was a lot more tedious than navigating easily through a hierarchy of subject headings, narrowing the results by facets on a browser tab and further narrowing the results by yet another topical term until we find just the right piece of content.</p>

<p>Taking the next leap we have natural language processing (NLP) that will answer the question, "Where do I find <em>handbooks</em> on <em>nutrition</em> in the <em>United States</em> for better <em>health</em>?" And that is the Holy Grail for search technology - and a long way from Mr. Dewey's idea for browsing the collection.</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/10/dewey_decimal_classification_c.html</link>
            <guid>http://gilbane.com/search_blog/2008/10/dewey_decimal_classification_c.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Definitions</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Categorization</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Classification</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Natural Language Processing (NLP)</category>
            
            <pubDate>Thu, 02 Oct 2008 19:32:27 -0500</pubDate>
        </item>
        
        <item>
            <title>Taxonomy, Yes, but for What?</title>
            <description><![CDATA[<p>The term <em>taxonomy </em>crept into the search lexicon by stealth and is now firmly entrenched. The very early search engines, <em>circa</em> 1972-73, presented searchers with the retrieval option of selecting content using controlled vocabularies from a standardized <em>thesaurus</em> of terminology in a particular discipline. With no neat graphical navigation tools, searches were crafted on a typewriter-like device, painfully typed in an arcane syntax. A stray hyphen, period or space would render the query un-computable, so after deciphering the error message, the searcher would try again. Each minute and each result cost money, so errors were a real expense.</p>

<p>We entered the Web search era bundling content into a directory structure, like the "Yellow Pages," or organizing query results into "folders" labeled with broad topics. The controlled vocabulary that represented directory topics or folder labels became known as a taxonomic structure, with the early ones at NorthernLight and Yahoo crafted by experts with knowledge of the rules of controlled vocabulary, thesaurus development and maintenance. Google derailed that search model with its simple "search box" requiring only a word or phrase to grab heaps of results. Today we are in a new era. Some people like searching by typing keywords in a box, while others prefer the suggestions of a directory or tree structure. Building taxonomic structures for more than e-commerce sites is now serious business for searches within enterprises where many employees prefer to <em>navigate</em> through the terminology to browse and discover the full scope of what is there.</p>

<p>Taxonomies for <u>navigation</u> are but one purpose for them to be used in search. Depending on the application domain, richness of the subject matter, scope and depth of topics, these lists can become quite large and complex. The more cross-references (e.g. <em>cell phones</em> USE <em>wireless phones</em>) are embedded in the list, the more likely the searcher's preferred term will be present. There is a diminishing return, however; if the user has to navigate to a system's preferred term too often; the entire process of searching becomes unwieldy and abandoned. On the other hand, if the system automates the smooth transition from one term to another, the richness and complexity of a taxonomy can be an asset.</p>

<p>In more sophisticated applications of taxonomies, the thesaurus model of relationships becomes a necessity. When a search engine, has embedded algorithms that can interpret explicit term relationships, it indexes content according to a taxonomy and all its cross-references. Taxonomy here <u>informs the index engine</u>. It requires substantial maintenance and governance of a much more granular nature than for navigation. To work well, a large corpus of terminology needs to be built to assure that what the content says and means, and what the searcher expects are a match in results. If the results of a search give back unsatisfactory results due to a poor taxonomy, trust in the search system fails rapidly and the benefits of whatever effort was put into building a taxonomy are lost.</p>

<p>I bring this up because the <strong>intent </strong>of any taxonomy is the first step in deciding whether to start building one. Either model is an on-going commitment but the latter is a much larger investment in sophisticated human resources. The conditions that must be met to have any taxonomy succeed must be articulated in selling the project and value proposition.</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/09/taxonomy_yes_but_for_what.html</link>
            <guid>http://gilbane.com/search_blog/2008/09/taxonomy_yes_but_for_what.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Taxonomy/Thesaurus/Ontology</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Navigated search</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search technology</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Taxonomy for search</category>
            
            <pubDate>Tue, 23 Sep 2008 13:45:18 -0500</pubDate>
        </item>
        
        <item>
            <title>Controlling Your Enterprise Search Application</title>
            <description><![CDATA[<p>When interviewing search administrators who had also been part of product selection earlier this year, I asked about surprises they had encountered. Some involved the selection process but most related to on-going maintenance and support. None commented on actual failures to retrieve content appropriately. That is a good thing whether it was because, during due diligence they had already tested for that during a proof of concept or because they were lucky.</p>

<p>Thinking about how product selections are made, prompts me to comment on a two major search product attributes that control the success or failure of search for an enterprise. One is the actual algorithms that control content indexing, what is indexed and how it is retrieved from the index (or indices). The second is the interfaces, interfaces for the population of searchers to execute selections, and interfaces for results presentation. On each aspect, buyers need to know what they can control and how best to execute it for success.</p>

<p><u>Indexing and retrieval technology</u> is embedded with search products; the number of administrative options to alter search scalability, indexing and content selection during retrieval is limited to none. The "secret sauce" for each product is largely hidden, although it may have patented aspects available for researching. Until an administrator of a system gets deeply into tuning, and experimenting with significant corpuses of content, it is difficult to assess the net effect of delivered tuning options. The time to make informed evaluations about how well a given product will retrieve <u>your</u> content when searched by <u>your</u> select audience is before a purchase is made. You can't control the underlying technology but you can perform a proof of concept (PoC). This requires:<br />
<ul><br />
	<li>human resources and a commitment of computing resources</li><br />
	<li>well-defined amount, type and nature (metadata plus full-text or full-text unstructured-only) to give a testable sample</li><br />
	<li>testers who are representative of all potential searchers</li><br />
	<li>a comparison of the results with three to four systems to reveal how well they each retrieve the intended content targets</li><br />
	<li>knowledge of the content by testers and similarity of searches to what will be routinely sought by enterprise employees or customers</li><br />
	<li>search logs of previously deployed search systems, if they exist. Searches that routinely failed in the past should be used to test newer systems</li><br />
	</ul><br />
<u>Interface technology</u><br />
Unlike the embedded search technology, buyers can exercise design control or hire a third-party to produce search interfaces that vary enormously. Controlling for what searchers experience when they first encounter a search engine, either a search box at a portal or a completely novel variety of search options with search box, navigation options or special search forms is within the control of the enterprise. This may be required if what comes "out-of-the box" as the default is not satisfactory. You may find, at a reasonable price, a terrific search engine that scales well, indexes metadata and full-text competently and retrieves what the audience expects but requires a different look-and-feel for your users. Through an API (application programming interface), SDK (software development kit) or application connectors (e.g. Documentum, SharePoint) numerous customization options are delivered with enterprise search packages or are available as add-ons. </p>

<p><br />
In either case, human resource costs must be added to the bottom line. A large number of mature software companies and start-ups are innovating with both their indexing techniques and interface design technologies. They are benefiting from several decades of search evolution for search experts, and now a decade of search experiences in the general population. Search product evolution is accelerating as knowledge of searcher experiences is leveraged by developers. You may not be able to control emerging and potentially disruptive technologies, but you can still exercise beneficial controls when selecting and implementing most any search system.<br />
</p>]]></description>
            <link>http://gilbane.com/search_blog/2008/09/controlling_your_enterprise_se.html</link>
            <guid>http://gilbane.com/search_blog/2008/09/controlling_your_enterprise_se.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Search Problems/Solved Search Problems</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Product evaluation</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search algorithms</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">Search interfaces</category>
            
            <pubDate>Wed, 10 Sep 2008 14:10:14 -0500</pubDate>
        </item>
        
    </channel>
</rss>
