Curated for content, computing, and digital experience professionals

Category: Computing & data (Page 73 of 80)

Computing and data is a broad category. Our coverage of computing is largely limited to software, and we are mostly focused on unstructured data, semi-structured data, or mixed data that includes structured data.

Topics include computing platforms, analytics, data science, data modeling, database technologies, machine learning / AI, Internet of Things (IoT), blockchain, augmented reality, bots, programming languages, natural language processing applications such as machine translation, and knowledge graphs.

Related categories: Semantic technologies, Web technologies & information standards, and Internet and platforms.

The Analyst’s Lament: Big Data Hype Obscures Data Management Problems in the Enterprise

I’ve been a market and product analyst for large companies. I realize that my experiences are a sample of one, and that I can’t speak for my analyst peers. But I suspect some of them would nod in recognition when I say that in those roles, I spent only a fraction of my time in these analyst roles actually conducting data analysis.  With the increase in press that Big Data has received, I started seeing a major gap between what I was reading about enterprise data trends, and my actual experiences working with enterprise data.

A more accurate description of what I spent large amounts of time doing was data hunting. And data gathering, and data cleaning, and data organizing, and data checking.  I spent many hours trying to find the right people in various departments who “owned” different data sources. I then had to get locate definitions (if they existed – this was hit or miss) and find out what quirks the data had so I could clean it without losing records (for example, which of the many data fields with the word “revenue” in it would actually give me revenue). In several cases I found myself begging fellow overworked colleagues to please, please, pull the data I needed from that database which I in theory should have had access to but was shut out of due to multiple layers of bureaucracy and overall cluelessness as to what data lived where within the organization.

Part of me thought, “Well, this is the lot of an analyst in a large company. It is the job.” And this was confirmed by other more senior managers – all on the business side, not in the IT side – who asserted that, yes, being a data hunter/gatherer/cleaner/organizer/checker was indeed my job. But another part of me was thinking, “These are all necessary tasks in dealing with data. I will always need to clean data no matter what. I will need to do some formatting and re-checking to make sure what I have is correct. But should this be taking up such a large chunk of my time? This is not the best way I can add value here. There are too many business questions I could potentially be trying to help solve; there has got to be a better way.”

So initially I thought, not being an IT professional, that this was an issue of not having the right IT tools. But gradually I came to understand that technology was not the problem. More often than not, I had access to best-in-class CRM systems, database and analytics software, and collaboration tools at my disposal. I had the latest versions of Microsoft Office and a laptop or desktop with decent processing power. I had reliable VPN connectivity when I was working remotely and often a company-supplied mobile smartphone. It was the processes and people that were the biggest barriers to getting the information I needed in order to provide fact-based research that could be used to solve business-critical decisions.

Out of sheer frustration, I started doing some research to see if there was indeed a better way for enterprises to manage their data. Master Data Management (MDM), you’ve been around for over a decade, why haven’t I ever encountered you?  A firm called the Information Difference, a UK-based consultancy which specializes in MDM, argues that too often, decisions about data management and data governance are left solely to the IT department. The business should also be part of any MDM project, and the governance process should be sponsored and led by C-level business management. Talk about “aha” moments.  When I read this, I actually breathed a sigh of relief. It isn’t just me that thinks there has to be a better way to go, so that the not-cheap business and market analysts that enterprises the world over employ can actually spend more of their time solving problems and less time data wrangling!

That’s why when I read the umpteenth article/blog post/tweet about how transformative Big Data is and will be, I cannot help but groan.  Before enterprises begin to think about new ways about structuring and distributing data, they need to do an audit of how existing data is already used within and between different businesses.  In particular, they should consider MDM if that has not already been implemented. There is so much valuable data that already exists in the enterprise, but the business and IT have to actually work together to deploy and communicate about data initiatives. They also need to evaluate if and how enterprise data is being used effectively for business decisions, and if that usage meets compliance and security rules.

I suspect that many senior IT managers know this and agree. I also suspect that getting counterparts in the business to be active and own decisions about enterprise data, and not just think data is an IT issue, can be a challenge. But in the long run, if this doesn’t happen more often, there’s going to be a lot of overpaid, underutilized data analysts out there and missed business opportunities. So if you are an enterprise executive wondering “do I have to worry about this Big Data business?” please take a step back and look at what you already have.  And if you know any seasoned data analysts in your company, maybe even talk to them about what would make them more effective and faster at their job. The answer may be simpler than you think.

Big data and decision making: data vs intuition

There is certainly hype around ‘big data’, as there always has been and always will be about many important technologies or ideas – remember the hype around the Web? Just as annoying is the backlash anti big data hype, typically built around straw men – does anyone actually claim that big data is useful without analysis?

One unfair characterization both sides indulge in involves the role of intuition, which is viewed either as the last lifeline for data-challenged and threatened managers, or as the way real men and women make the smart difficult decisions in the face of too many conflicting statistics.

Robert Carraway, a professor who teaches Quantitative Analysis at UVA’s Darden School of Business, has good news for both sides. In a post on big data and decision making in Forbes, “Meeting the Big Data challenge: Don’t be objective” he argues “that the existence of Big Data and more rational, analytical tools and frameworks places more—not less—weight on the role of intuition.”

Carraway first mentions Corporate Executive Board’s findings that of over 5000 managers 19% were “Visceral decision makers” relying “almost exclusively on intuition.” The rest were more or less evenly split between “Unquestioning empiricists” who rely entirely on analysis and “Informed skeptics … who find some way to balance intuition and analysis.” The assumption of the test and of Carraway was that Informed skeptics had the right approach.

A different study, “Frames, Biases, and Rational Decision-Making in the Human Brain“, at the Institute of Neurology at University College London tested for correlations between the influence of ‘framing bias’ (what it sounds like – making different decisions for the same problem depending on how the problem was framed) and degree of rationality. The study measured which areas of the brain were active using an fMRI and found the activity of the the most rational (least influenced by framing) took place in the prefrontal cortex, where reasoning takes place; the least rational (most influenced by framing / intuition) had activity in the amygdala (home of emotions); and the activity of those in between (“somewhat susceptible to framing, but at times able to overcome it”) in the cingulate cortex, where conflicts are addressed.

It is this last correlation that is suggestive to Carraway, and what he maps to being an informed skeptic. In real life, we have to make decisions without all or enough data, and a predilection for relying on either data or intuition can easily lead us astray. Our decision making benefits by our brain seeing a conflict that calls for skeptical analysis between what the data says and what our intuition is telling us. In other words, intuition is a partner in the dance, and the implication is that it is always in the dance — always has a role.

Big data and all the associated analytical tools provide more ways to find bogus patterns that fit what we are looking for. This makes it easier to find false support for a preconception. So just looking at the facts – just being “objective” – just being “rational” – is less likely to be sufficient.

The way to improve the odds is to introduce conflict – call in the cingulate cortex cavalry. If you have a pre-concieved belief, acknowledge it and and try and refute, rather than support it, with the data.

“the choice of how to analyze Big Data should almost never start with “pick a tool, and use it”. It should invariably start with: pick a belief, and then challenge it. The choice of appropriate analytical tool (and data) should be driven by: what could change my mind?…”

Of course conflict isn’t only possible between intuition and data. It can also be created between different data patterns. Carraway has an earlier related post, “Big Data, Small Bets“, that looks at creating multiple small experiments for big data sets designed to minimize identifying patterns that are either random or not significant.

Thanks to Professor Carraway for elevating the discussion. Read his full post.

Customer experiences, communications, and analytics

three epicenters of innovation in modern marketing
I recently discovered Scott Brinker’s Chief Marketing Technologist blog and recommend it as a useful resource for marketers. The Venn diagram above is from a recent post, 3 epicenters of innovation in modern marketing. It was the Venn diagram that first grabbed my attention because I love Venn diagrams as a communication tool, it reminded me of another Venn diagram well-received at the recent Gilbane Conference, and most of the conference discussions map to someplace in the illustration.

As good as the graphic is on its own, you should read Scott’s post and see what he has to say about the customer experience “revolution”.

Lest you think Scott is a little too blithe in his acceptance of the role of big data, see his The big data bubble in marketing — but a bigger future, where the first half of the (fairly long) post talks about all the hype around big data. But you should read the full post because he is right on target in describing the role of big data in marketing innovation, and in his conclusion that data-driven organizations will need to make use of big data though these data-driven and data-savvy organizations will take some time to build.

So don’t let current real or perceived hype about the role of big data in marketing lead you to discount its importance – it’s a matter of when, not if. “When” is not easy to predict, but will certainly be different depending on an organizations’ resources and ability to deal with complexity, and organizational and infrastructure changes.

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Integrating External Data & Enhancing Your Prospects

Most companies with IT account teams and account selling strategies have a database in a CRM system and the company records in that database generally have a wide range of data elements and varying degrees of completeness. Beyond the basic demographic information, some records are more complete than others with regard to providing information that can tell the account team more about the drivers of sales potential. In some cases, this additional data may have been collected by internal staff, in other cases, it may be the result of purchased data from organizations like Harte-Hanks, RainKing, HG Data or any number of custom resources/projects.

There are some other data elements that can be added to your database from freely available resources. These data elements can enhance the company records by showing which companies will provide better opportunities. One simple example we use in The Global 5000 database is the number of employees that have a LinkedIn profile. This may be an indicator that companies with a high percentage of social media users are more likely to purchase or use certain online services. That data is free to use. Obviously, that indicator does not work for every organization and each company needs to test the data correlation between customers and the attributes, environment or product usage.

Other free and interesting data can be found in government filings. For example, any firm with benefit and 401k plans must file federal funds and that filing data is available from the US government. A quick scan of the web site data.gov  shows a number of options and data sets available for download and integration into your prospect database. The National Weather Center, for example, provides a number of specific long term contracts which can be helpful for anyone selling to the agriculture market.

There are a number things that need to be considered when importing and appending or modeling external data. Some of the key aspects include:

  • A match code or record identifier whereby external records can be matched to your internal company records. Many systems use the DUNS number from D&B rather than trying to match on company names which can have too many variations to be useful.
  • The CRM record level needs to be established so that the organization is focused on companies at a local entity level or at the corporate HQ level.  For example, if your are selling multi-national network services, having lots of site recrods is probably not helpful when you most likely have to sell at the corporate level.
  • De-dupe your existing customers. When acquiring and integrating an external file — those external sources won’t know your customer set and you will likely be importing data about your existing customers. If you are going to turn around and send this new, enhanced data to your team, it makes sense to identify or remove existing clients from that effort so that your organization is not marketing to them all over again.
  • Identifying the key drivers that turn the vast sea of companies into prospects and then into clients will provide a solid list of key data attributes that can be used to append to existing records.  For example, these drivers may include elements such as revenue growth, productivity measures such as revenue per employee, credit ratings, multiple locations or selected industries.

In this era of marketing sophistication with increasing ‘tons’ of Big Data being available and sophisticated analytical tools coming to market every company has the opportunity to enhance their internal data by integrating external data and going to market armed with more insight than ever before.

Learn more about more the Global 5000 database

 

Frank Gilbane interview on Big Data

Big data is something we cover at our conference and this puzzles some given our audience of content managers, digital marketers, and IT, so I posted Why Big Data is important to Gilbane Conference attendees on gilbane.com to explain why. In the post I also included a list of the presentations at Gilbane Boston that address big data. We don’t have a dedicated track for big data at the conference but there are six presentations including a keynote.

I was also interviewed on the CMS-Connected internet news program about big data the same week, which gave me an opportunity to answer some additional questions about big data and its relevance to the same kind of  audience. There is still a lot more to say about this, but the post and the interview combined cover the basics.

The CMS-Connected show was an hour long and also included Scott and Tyler interviewing Rob Rose on big data and other topics. You can see the entire show here, or just the 12 twelve minute interview with me below.

Private Companies and Public Companies – Sizing up IT Spending

One aspect of the Global 5000 company database is that we include all types, shapes and locations of companies including those that are publicly listed as well as private firms. For those who sell to corporations (as opposed to consumers) there is a great deal of interest in private companies. A lot of this can be attributed to the fact that public companies have to disclose so much about their size, shape and all aspects of their organizations – most everyone knows or can find out what they need to. Privates, on the other hand, are less well known and hold the allure that there is great, undiscovered opportunity in there.

To get a sense of the dynamics of the public/private we examined a number of metrics related to companies in the Global 5000 database.  It is true that more large companies are publicly traded. Of the 5000 companies, nearly 4,000 are public and just over 1,000 are private. That is the inverse of the market as a whole where most companies in any country or industry are private. Here are a few facts about each group.

  • The average revenue for a public company in the Global 5000 is $10.3 billion while the private companies averaged $10.6 billion
  • Public companies reported an average revenue per employee of $214,000 while private companies were just over $282,000
  • For both 2010 and 2011, revenue for both public and private companies grew by slightly more than 11.5%. Virtually no difference.
  • In both cases, IT spending per company is over $290 million and approximately 2.7% of revenue.
  • Total IT spending for Global 5000 public companies is approximately $1.1 trillion while private Global 5000 companies will spend about $300 billion.

The bottom line here is that big is big. It does not make much difference if the company is public or private, the big guys will spend a lot on a wide variety of products and services including IT products and services. The real difference is in the number of these large opportunities there are. Just because we find a few of these nuggets among the privates, does not mean all privates look alike.  Most are quite a bit smaller.

Learn more about more the Global 5000 database

The Flip Side of IT Spending and Productivity

In our last post we explored the companies in The Global 5000 that showed the biggest gains in revenue per employee AND spent the most on IT.  The idea is that this group will continue to spend and strive for continuous improvements — making some great potential targets for those IT suppliers that can show their offerings help save money.

Now, we turn the page and explore the other end of the spectrum. Again, taking companies in the Global 5000 data base we now look at the bottom 2000 companies in terms of revenue per employee change  That is — they are not on a positive track. From this group we then took the lowest 1000 firms in terms of IT spending.

We can look at this set of companies in one of two ways – either:

  • they are ripe opportunities who will need to invest in order to grow their revenue faster or get more productivity out of the existing workforce
  • OR – they are not going any further with technology spending and their growth is not going to be via increasing spending per employee.

We should run to the first group and run away from the second.  Here is the profile of these 1,000 companies where these industries have traditionally been a challenge for the IT suppliers.

The top countries are:

  • USA
  • UK
  • Japan
  • Canada
  • France
  • Spain

And the top industries:

  • Industrial Manufacturers
  • Retailers
  • Consumer Goods Manufacturers
  • Business Services
  • Construction

For more information about The Global 5000 database click here

 

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑