Gilbane Conferences & Advisor

Curated content for content, computing, and digital experience professionsals

Tag: big data

Gilbane Advisor 4-25-18 — deep learning value, martech size, no-click searches

Notes from the AI frontier: Applications and value of deep learning

In 2011 as the excitement about Big Data was becoming mainstream, McKinsey published what was the most useful early report for executives. Big data: The next frontier for innovation, competition, and productivity, took a smart and measured look at use cases and value across industries. Given the symbiotic relationship between data and AI / machine learning, companies who were paying attention and invested in Big Data then are likely positioned well ahead of others to benefit from today’s advances in machine learning technologies and techniques.

AI performance improvement by industry

McKinsey’s new report provides a knowledgeable overview using accurate terminology in their “… analysis of more than 400 use cases across 19 industries and nine business functions highlights the broad use and significant economic potential of advanced AI techniques.” Highly recommended. Read More

A flaw-by-flaw guide to Facebook’s new GDPR privacy changes

Josh Constine provides a useful take on the changes rolling out now to European users illustrated with screen shots. But I think it’s safe to say that whether they are meeting the “letter of the GDPR law” is still a matter for debate.

Overall, it seems like Facebook is complying with the letter of GDPR law, but with questionable spirit…Facebook struck the right balance in some places here. But the subtly pushy designs seem intended to steer people away from changing their defaults. Read More

Marketing Technology Landscape Supergraphic (2018)

Scott Brinker has just released the latest update to his famous “Supergraphic”. The number of marketing technology vendors continues to grow. As Scott puts it, “Water continues to flow into the martech tub faster than it’s draining out.” Check out his post on what it all means and to see/download the graphic and a spreadsheet. Read More

Uh oh, click counts count less

Click quality and measurement has always been a bit iffy. But the biggest challenge to click value yet may come from a combination of mobile trends and Google’s strategy of reducing the need to click away from the search results page. Rand Fishkin’s post, New Data: How Google’s Organic & Paid CTRs Have Changed 2015-2018, looks at some interesting numbers. Back to brand marketing banners?
No-click searches desktop vs mobile

Ultimately, I think this data shows us that the future of SEO will have to account for influencing searchers without earning a click, or even knowing that a search happened. That’s going to be very frustrating for a lot of organizations. Read More

Also…

The Gilbane Advisor curates content for content, computing, and digital experience professionals. We focus on strategic technologies. We publish more or less twice a month except for August and December. See all issues

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Why Big Data is important to Gilbane Conference attendees

If you think there is too much hype, and gratuitous use of the term, big data, you haven’t seen anything yet. But don’t make the mistake of confusing the hype with how fundamental and how transformational big data is and will certainly be. Just turn your hype filter to high and learn enough about it to make your own judgements about how it will affect your business and whether it is something you need to do something about now, or monitor for future planning.

As I said yesterday in a comment on a post by Sybase CTO Irfan Khan Gartner dead wrong about big data hype cycle (with a response from Gartner):

However Gartner’s Hype Cycle is interpreted I think it is safe to say that most, including many analysts, underestimate how fundamental and how far-reaching big data will be. How rapidly its use will evolve, and in which applications and industries first, is a more difficult and interesting discussion. The twin brakes of a shortage of qualified data scientist skills and the costs and complexities of IT infrastructure changes will surely slow things down and cause disillusionment. On the other hand we have all been surprised by how fast some other fundamental changes have ramped up, and BDaaS (Big Data as a Service) will certainly help accelerate things. There is also a lot more big data development and deployment activity going on than many realize – it is a competitive advantage after all.

There is also a third “brake” which is all the uncertainty around privacy issues. There is already a lot of consumer data that is not being fully used because of fear of customer backlash or new regulation and, one hopes, because of a degree of respect for consumer’s privacy.

Rob Rose expanded on some specific concerns of marketers in a recent post Big Data & Marketing – It’s A Trap!, including the lack of resources for interpreting even the current mostly website analytics data marketers already have. It’s true, and not just for smaller companies. In addition there are at least four requirements for making big data analytics accessible to marketers that are largely beyond the reach of most current organizations.

Partly to the rescue is Big Data as a Service BDaaS (one of the more fun-sounding acronyms). BDaaS is going to be a huge business. All the big technology infrastructure firms are getting involved and all the analytics vendors will all have cloud and big data services. There are also many new companies including some surprises. For example, after developing its own Hadoop-based big data analytics expertise Sears created subsidiary MetaScale to provide BDaaS to other enterprises. Ajay Agarwal from Bain Capital Ventures predicts that the confluence of big data and marketing will lead to several new multi-billion dollar companies and I think he is right.

But while big data is important for the marketers, content managers, and IT who attend our conference because of the potential for enhanced predictive analytics and content marketing. The reach and value of big data applications is far broader than marketing – executives need to understand the potential for new efficiencies, products and businesses. The well-known McKinsey report “Big Data: The Next Frontier for Innovation, Competition, and Productivity” (free) is a good place to start. If you are in the information business I focus on that in my report Big-Data: Big Deal or Just Big Buzz? (not free).

Big data presentations at Gilbane Boston

This year we have six presentations on big data, two devoted to big data and marketing and all chosen with an eye towards the needs of our audience of marketers, content strategists, and IT. You can find out more about these presentations, including their date and time on the conference program.

Keynote

Bill Simmons, CTO, DataXu
Why Marketing Needs Big Data

Main Conference Presentations

Tony Jewitt, VP Big Data Solutions at Avalon Consulting, LLC
“Big Data” 101 for Business

Bryan Bell, Vice President, Enterprise Solutions, Expert System
Semantics and the Big Data Opportunity

Brian Courtney, General Manager of Operations Data Management, GE Intelligent Platforms
Leveraging Big Data Analytics

Darren Guarnaccia, Senior VP, Product Marketing, Sitecore
Big Data: What’s the Promise and Reality for Marketers?

Stefan Andreasen, Founder and Chief Technology Officer, Kapow Software
Big Data: Black Hole or Strategic Value?

Update: There is now a video of me being interviewed on big data by CMS-Connected.

Right Fitting Enterprise Search: Content Must Fit Like a Glove

This story brought me up short: Future of Data: Encoded in DNA by Robert Lee Hotz in the Wall Street Journal, Aug. 16, 2012. It describes how “…researchers encoded an entire book into the genetic molecules of DNA, the basic building block of life, and then accurately read back the text.” The article then went on to quote Harvard University’s project senior researcher, molecular geneticist, George Church as saying, “A device the size of your thumb could store as much information as the whole Internet. While this concept intrigues and excites me for the innovation and creative thinking, it stimulates another thought, as well. Stop the madness of content overload first – force it to be managed responsibly.

While I have been sidelined from blogging for a couple of months, industry pundits have been contributing their comments, reflections and guidance on three major topics. Big Data tops the list, with analytics a close second, rounded out by contextual relevance as an ever present content findability issue. In November at Gilbane Boston the program features a study conducted by Findwise, Enterprise Search and Findability Survey,2012, which you can now download. It underscores a disconnect between what enterprise searchers want and how search is implemented (or not), within their organizations. As I work to assemble content, remarks and readings for an upcoming graduate course on “Organizing and Accessing Information and Knowledge,” I keep reminding myself what knowledge managers need to know about content to make it accessible.

So, how would experts for our three dominant topics solve the problems illustrated in the Findwise survey report?

For starters, organizations must be more brutal with content housekeeping, or more specifically housecleaning. As we debate whether our country is as great at innovation as in generations past, consider big data as a big barrier. Human beings, even brilliant ones, can only cope with so much information in their waking working hours. I posit that we have lost the concept of primary source content, in other words content that is original, new or innovative. It is nearly impossible to hone in on information that has never been articulated in print or electronically disseminated before, excluding all the stuff we have seen, over and over again. Our concept of terrific search is to be able to traverse and aggregate everything “out there” with no regard for what is truly conceptually new. How much of that “big data” is really new and valuable? I am hoping that other speakers at Gilbane Boston 2012 can suggest methods for crunching through the “big” to focus search on the best, most relevant and singular primary source information.

Second, others have commented, and I second the idea, that analytic tools can contribute significantly to cleansing search domains of unwanted and unnecessary detritus. Search tools that auto-categorize and cross-categorize content, whether the domain is large or small should be employed during any launch of a new search engine to organize content for quick visual examination, showing you where metadata is wrong, mis-characterized, or poorly tagged. Think of a situation where templates are commonly used for enterprise reports and the name of the person who created the template becomes the “author” of every report. Spotting this type of problem and taking steps to remediate and cleanse metadata, before deploying the search system is a fundamental practice that will contribute to better search outcomes. With thoughtful management, this type of exercise will also lead to corrective actions on the content governance side by pointing to how metadata must be handled. Analytics functions that leverage search to support cleaning up data stores are among the most practical tools now packaged with newer search products.

Finally, is the issue of vocabulary management and assigning terminology that is both accurate and relevant for a specific community that needs to find content quickly and without multiple versions, or without content that is just a re-hash of earlier findings published by the originator. Original publication dates, source information and proper author attribution are key elements of metadata that must be in place for any content that is targeted for crawling and indexing. When metadata is complete and accurate, a searcher can expect the best and most relevant content to rise to the top of a results page.

I hope others in a position to do serious research (perhaps a PhD dissertation) will take up my challenge to codify how much of “big data” is really worthy of being found – again, again, and again. In the meantime, use the tools you have in the search and content management technologies to get brutal. Weed the unwanted and unnecessary content so that you can get down to the essence of what is primary, what is good, and what is needed.

What technologies is marketing spending on?

Spencer Ante reports in today’s Wall Street Journal that As Economy Cools, IBM Furthers Focus on Marketers. The title and the short article are focused on IBM’s well-known emphasis on marketers, but the article is of more general interest in driving home the extent of one trend in corporate technology spending – the growth of marketing spending on technology – and provoking a number of questions about what it means. At only 600 or so words the article may be useful for some of you to forward to others in your organization that would benefit by thinking more about the effects of this trend.

The article quotes some recent Gartner research that marketing budgets are roughly 3 times IT budgets as a percentage of revenue, and grew between 2011 and 2012 while IT budgets shrank. Current marketing and IT budgets are both expected to increase, but with marketing budgets increasing at twice the rate of IT budgets – 9.0% vs 4.7%. Gartner has also predicted CMOs will have more control over technology spending than CIOs by 2017. Also, “In total, Gartner says companies spent up to $25 billion worldwide on marketing software last year, up from about $20 billion the previous year. Overall corporate software expenditures totaled $115 billion…”. These are impressive numbers, and our own experience based on discussions with our conference attendees, consulting clients, and other analysts and investors, suggests a broad consensus with the trend. Certainly IBM is big believer.

But the next level of detail is even more important for technology vendors and all CMOs who want to benchmark their competitors spending and strategies – for example, what are CMOs spending money on? what should they be spending on” and how do they organize their infrastructure to learn about, purchase, and manage new marketing technologies, and work with IT?

A vocal segment of the technology press suggest that the future of marketing is all about “social”. A favorite prediction of analysts is that the “Web is dead” and the future is all about mobile. Savvy marketers are beyond such oversimplifications. As important as social and mobile are, I think it is safe to say they are still a small percentage of the $25 billion Gartner number. I would love to be enlightened by anyone who has more details on what the percentage is, and what technology categories others think will benefit most from the increase in marketing spending.

Why is this?

Part of the reason are expensive legacy systems and infrastructures. But a bigger reason is that everyone (not just marketing) is learning. Most of the new technologies have some learning curve, but are not rocket science. The really steep curve is learning how to integrate and utilize new technologies, and especially data they provide, effectively – and that is something we all: technologists, marketers, analysts, will be learning about for awhile.

Learn more at Gilbane Boston.

Marketing, big data, and content

“Content” in this context means unstructured data. The need to manage unstructured data is one of the main reasons big data technologies exist – the other being the need for dealing with scale and speed. This is why it is important for us to cover at our conferences. Not every company needs to build new infrastructures around Hadoop-like technologies… yet. But marketers need to manage the mostly unstructured content that is part of their world, and also process and manage the more structured analytic data that will rapidly become “big” for even small organizations, so big data technologies need to be on marketing organizations’ radar as they continue to increase their expertise and spending on technology. See yesterday’s post on Why marketing is the next big money sector in technology.

Endeca Now Integrates Hadoop

Endeca Technologies, Inc., an agile information management software company, announced native integration of Endeca Latitude with Apache Hadoop. Endeca Latitude, based on the Endeca MDEX hybrid search-analytical database, is uniquely suited to unlock the power of Apache Hadoop. Apache Hadoop is strong at manipulating semi-structured data, which is a challenge for traditional relational databases. This combination provides flexibility and agility in combining diverse and changing data, and performance in analyzing that data. Enabling Agile BI requires a complete data-driven solution that unites integration, exploration and analysis from source data through end-user access that can adapt to changing data, changing data sources, and changing user needs. Solutions that require extensive pre-knowledge of data models and end-user needs fail to meet the agility requirement. The united Endeca Latitude and Apache Hadoop solution minimizes data modeling, cleansing, and conforming of data prior to unlocking the value of Big Data for end-users. http://www.endeca.com/ http://hadoop.apache.org/

Classifying Searchers – What Really Counts?

I continue to be impressed by the new ways in which enterprise search companies differentiate and package their software for specialized uses. This is a good thing because it underscores their understanding of different search audiences. Just as important is recognition that search happens in a context, for example:

  • Personal interest (enlightenment or entertainment)
  • Product selection (evaluations by independent analysts vs. direct purchasing information)
  • Work enhancement (finding data or learning a new system, process or product)
  • High-level professional activities (e-discovery to strategic planning)

Vendors understand that there is a limited market for a product or suite of products that will satisfy every budget, search context and the enterprise’s hierarchy of search requirements. Those who are the best focus on the technological strengths of their search tools to deliver products packaged for a niche in which they can excel.

However, for any market niche excellence begins with six basics:

  • Customer relationship cultivation, including good listening
  • Professional customer support and services
  • Ease of system installation, implementation, tuning and administration
  • Out-of-the box integration with complementary technologies that will improve search
  • Simple pricing for licensing and support packages
  • Ease of doing business, contracting and licensing, deliveries and upgrades

While any mature and worthy company will have continually improved on these attributes, there are contextual differentiators that you should seek in your vertical market:

  • Vendor subject matter expertise
  • Vendor industry expertise
  • Vendor knowledge of how professional specialists perform their work functions
  • Vendor understanding of retrieval and content types that contribute the highest value

At a recent client discussion the application of a highly specialized taxonomy was the topic. Their target content will be made available on a public facing web site and also to internal staff. We began by discussing the various categories of terminology already extracted from a pre-existing system.

As we differentiated how internal staff needed to access content for research purposes and how the public is expected to search, patterns emerged for how differently content needs to be packaged for each constituency. For you who have specialized collections to be used by highly diverse audiences, this is no surprise. Before proceeding with decisions about term curation and determining the granularity of their metadata vocabulary, what has become a high priority is how the search mechanisms will work for different audiences.

For this institution, internal users must have pinpoint precision in retrieval on multiple facets of content to get to exactly the right record. They will be coming to search with knowledge of the collection and more certainty about what they can expect to find. They will also want to find their target(s) quickly. On the other hand, the public facing audience needs to be guided in a way that leads them on a path of discovery, navigating through a map of terms that takes them from their “key term” query through related possibilities without demanding arcane Boolean operations or lengthy explanations for advanced searching.

There is a clear lesson here for seeking enterprise search solutions. Systems that favor one audience over another will always be problematic. Therefore, establishing who needs what and how each goes about searching needs to be answered, and then matched to the product that can provide for all target groups.

We are in the season for conferences; there are a few next month that will be featuring various search and content technologies. After many years of walking exhibit halls and formulating strategies for systematic research and avoiding a swamp of technology overload, I try now to have specific questions formulated that will discover the “must have” functions and features for any particular client requirement. If you do the same, describing a search user scenario to each candidate vendor, you can then proceed to ask: Is this a search problem your product will handle? What other technologies (e.g. CMS, vocabulary management) need to be in place to ensure quality search results? Can you demonstrate something similar? What would you estimate the implementation schedule to look like? What integration services are recommended?

These are starting points for a discussion and will enable you to begin to know whether this vendor meets the fundamental criteria laid out earlier in this post. It will also give you a sense of whether the vendor views all searchers and their searches as generic equivalents or knows that different functions and features are needed for special groups.

Look for vendors for enterprise search and search related technologies to interview at the following upcoming meetings:

Enterprise Search Summit, New York, May 10 – 11 […where you will learn strategies and build the skill sets you need to make your organization’s content not only searchable but “findable” and actionable so that it delivers value to the bottom line.] This is the largest seasonal conference dedicated to enterprise search. The sessions are preceded by separate workshops with in-depth tutorials related to search. During the conference, focus on case studies of enterprises similar to yours for better understanding of issues, which you may need to address.

Text Analytics Summit, Boston, May 18 – 19 I spoke with Seth Grimes, who kicks off the meeting with a keynote, asking whether he sees a change in emphasis this year from straight text mining and text analytics. You’ll have to attend to get his full speech but Seth shared that he see a newfound recognition that “Big Data” is coming to grips with text source information as an asset that has special requirements (and value). He also noted that unstructured document complexities can benefit from text analytics to create semantic understanding that improves search, and that text analytics products are rising to challenge for providing dynamic semantic analysis, particularly around massive amounts of social textual content.

Lucene Revolution, San Francisco, May 23 – 24 […hear from … the foremost experts on open source search technology to a broad cross-section of users that have implemented Lucene, Solr, or LucidWorks Enterprise to improve search application performance, scalability, flexibility, and relevance, while lowering their costs.] I attended this new meeting last year when it was in Boston. For any enterprise considering or leaning toward implementing open source search, particularly Lucene or Solr, this meeting will set you on a path for understanding what that journey entails.