Curated for content, computing, and digital experience professionals

Month: December 2012

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

HTML5 Definition Complete, W3C Moves to Interoperability Testing and Performance

HTML5_Logo_128The W3C announced today that the HTML5 definition is complete, and on schedule to be finalized in 2014. This is excellent news for the future of the open Web, that is, all of us. If you were involved in discussions about mobile development strategies at our recent conference you’ll want to check out all the details at http://dev.w3.org/html5/decision-policy/html5-2014-plan.

Moving right along, the HTML Working Group also published the first draft of HTML 5.1 so you can see a little further down the road for planning purposes. See http://www.w3.org/TR/2012/WD-html51-20121217/.

From the W3C newsletter…

W3C published today the complete definition of the “HTML5” and “Canvas 2D” specifications. Though not yet W3C standards, these specifications are now feature complete, meaning businesses and developers have a stable target for implementation and planning. “As of today, businesses know what they can rely on for HTML5 in the coming years, and what their customers will demand,” said Jeff Jaffe, W3C CEO. HTML5 is the cornerstone of the Open Web Platform, a full programming environment for cross-platform applications with access to device capabilities; video and animations; graphics; style, typography, and other tools for digital publishing; extensive network capabilities; and more.

To reduce browser fragmentation and extend implementations to the full range of tools that consume and produce HTML, W3C now embarks on the stage of W3C standardization devoted to interoperability and testing. W3C is on schedule to finalize the HTML5 standard in 2014. In parallel, the W3C community will continue its work on next generation HTML features, including extensions to complement built-in HTML5 accessibility, responsive images, and adaptive streaming.

Integrating External Data & Enhancing Your Prospects

Most companies with IT account teams and account selling strategies have a database in a CRM system and the company records in that database generally have a wide range of data elements and varying degrees of completeness. Beyond the basic demographic information, some records are more complete than others with regard to providing information that can tell the account team more about the drivers of sales potential. In some cases, this additional data may have been collected by internal staff, in other cases, it may be the result of purchased data from organizations like Harte-Hanks, RainKing, HG Data or any number of custom resources/projects.

There are some other data elements that can be added to your database from freely available resources. These data elements can enhance the company records by showing which companies will provide better opportunities. One simple example we use in The Global 5000 database is the number of employees that have a LinkedIn profile. This may be an indicator that companies with a high percentage of social media users are more likely to purchase or use certain online services. That data is free to use. Obviously, that indicator does not work for every organization and each company needs to test the data correlation between customers and the attributes, environment or product usage.

Other free and interesting data can be found in government filings. For example, any firm with benefit and 401k plans must file federal funds and that filing data is available from the US government. A quick scan of the web site data.gov  shows a number of options and data sets available for download and integration into your prospect database. The National Weather Center, for example, provides a number of specific long term contracts which can be helpful for anyone selling to the agriculture market.

There are a number things that need to be considered when importing and appending or modeling external data. Some of the key aspects include:

  • A match code or record identifier whereby external records can be matched to your internal company records. Many systems use the DUNS number from D&B rather than trying to match on company names which can have too many variations to be useful.
  • The CRM record level needs to be established so that the organization is focused on companies at a local entity level or at the corporate HQ level.  For example, if your are selling multi-national network services, having lots of site recrods is probably not helpful when you most likely have to sell at the corporate level.
  • De-dupe your existing customers. When acquiring and integrating an external file — those external sources won’t know your customer set and you will likely be importing data about your existing customers. If you are going to turn around and send this new, enhanced data to your team, it makes sense to identify or remove existing clients from that effort so that your organization is not marketing to them all over again.
  • Identifying the key drivers that turn the vast sea of companies into prospects and then into clients will provide a solid list of key data attributes that can be used to append to existing records.  For example, these drivers may include elements such as revenue growth, productivity measures such as revenue per employee, credit ratings, multiple locations or selected industries.

In this era of marketing sophistication with increasing ‘tons’ of Big Data being available and sophisticated analytical tools coming to market every company has the opportunity to enhance their internal data by integrating external data and going to market armed with more insight than ever before.

Learn more about more the Global 5000 database

 

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑