The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish weekly via email and on our blog except for August and December.
This week we have articles from Jon Udell and Ben Thompson. News comes from TransPerfect and Semantix, Datadobi, Couchbase, HubSpot, and the Apache Cassandra Project.
BTW, I don’t normally recommend anything to read that is completely behind a firewall. I do often recommend articles when the publisher offers a few free articles before restricting access.
Opinion / Analysis
A beautiful power tool to scrape, clean, and combine data
Are you someone who is not a data scientist but would love to be able to do some of your own data analysis without learning to code? Jon Udell has been looking for a way for non-technologists do this for years. He is technical himself, but often finds the effort more trouble than it’s worth. He provides an in-depth review of a product (it’s not his and there is a free version), and walks you through how he has used it. If you want some data analysis agency and don’t mind digging in to some details, this article is worth a look.
In his weekly free article Ben Thompson picks up on the fact that both Microsoft and Facebook have recently started talking about “the metaverse”. Of course their concepts of the metaverse differ in predictable ways: think Microsoft -> enterprise applications, and Facebook -> a virtual reality platform. Thomson goes deeper and provides some background on the term and its use. It is a great word, and it’s a safe bet we’ll be hearing it a lot more and used with much imagination.
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
First of all please note that next week I’ll be on vacation. 🏖
This week we have articles from the Netflix Technology Blog, HBR, and McKinsey. This week’s news comes from Widen and Clarifai, Sinequa, Microsoft, Agilty CMS, and Appfire and Spartez.
Opinion / Analysis
Not every organization has the digital asset management challenges that Netflix has, at least in terms of scale. But even if you don’t, our first article this week is a generous how-they-did-it by Netflix software engineers Burak Bacioglu and Meenakshi Jindal, to share with product and development teams.
We follow with two explainers / perspectives on quantum computing. Both are short, high level, and will help keep you current. First, Francesco Bova, Avi Goldfarb, and Roger Melko explain why computing applications that require combinatoric calculations are an effective way to understand existing and near term uses of quantum computing. Next, Victor Galitski, Dmitry Green, Benjamin Lev, Yuval Oreg, and Henning Soller discuss how to navigate investment decisions in quantum computing given its nascent state.
Elasticsearch indexing strategy in Netflix digital asset management platform
At Netflix, all of our digital media assets (images, videos, text, etc.) are stored in secure storage layers. We built an asset management platform (AMP), codenamed Amsterdam, in order to easily…
Digital computing has limitations in regards to an important category of calculation called combinatorics, in which the order of data is important to the optimal solution. Computers and software that are predicated on the assumptions of quantum mechanics have the potential to perform combinatorics and other calculations much faster…
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
This week we have articles by Prukalpa Sankar, and James G. Kobielus, with news from Aquia, Primer, Monotype, Lighthouse, Zoom and Sensory.
Opinion / Analysis
Metadata is nothing new for most of you, though it’s likely your experience with it is in the context of a specific business function with similar data types and sources. But the value of metadata extends to data processing applications across organizations, and these days cross-application metadata is becoming a requirement. Our two authors this week have very different professional backgrounds, but both are addressing how to manage this complexity. As you’ll see, knowledge graphs have a critical role in each their recommendations.
The rise of the metadata lake
Architecture for a modern metadata lake. (Image by Atlan.)
Introducing a new way of storing metadata for today’s limitless use cases like data discovery, lineage, observability and fabrics.
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
This week we have articles by Sachin Gupta, Panos Moutafis, Matthew J. Schneider, and Dan McCreary, and news from Bloomreach, SparkCognition, Neeva, MerlinOne, and Solo.io.
Opinion / Analysis
To protect consumer data, don’t do everything on the cloud
Sachin Gupta, Panos Moutafis, and Matthew J. Schneider team-up to describe a high-level approach to employing edge computing to reduce risk and dependence on consumer data use. A good read for senior management teams.
Edge computing, in which data is processed locally on hardware instead of on the cloud, can help them do just that by implementing three critical design choices. The design choices begin with how to think about data collection and extend to the actual data processing. They are: 1) sufficiency, or a focus on only must-have data; 2) aggregation, or lumping data together to produce group insights; and 3) alteration, or making minor changes to the data to hide an individual’s identity while minimally impacting the accuracy of insights.
… But how does this tech actually work, and how can companies who don’t have Apple-sized resources deploy it?
Dan McCreary predicts the arrival of a new discipline based on the availability of data stores of enterprise knowledge graphs to increase data analysis productivity. To get there, he argues we need to go beyond current approaches of data warehouses and feature stores to building…
…a set of tools for analysts to connect directly to a well-formed enterprise-scale knowledge graph to get a subset of data and transform it quickly to structures that are immediately useful for analysis. The results of this analysis can then be used to immediately enrich a knowledge graph. These pure Machine Learning approaches can complement the rich library of turn-key graph algorithms that are accessible to developers.
What the heck is a Data Mesh?! A bit technical. This is a critical look at Zhamak Dehghani’s original two posts (links included) and all are worth a read for the seriously interested.
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
This week we have articles by Kate Kaye, and Sarah Wang and Martin Casado, and news from Coveo, Quantum, Druid, Kentico, Retresco, and Mirakl.
Opinion / Analysis
Google may have to play nice in Privacy Sandbox, thanks to U.K. antitrust authority’s role as referee
Image by tookapic from Pixabay
Kate Kaye reports on Googles commitment to the UK’s Competition and Markets Authority (CMA) to allow them
to take up a role in the design and development of Google’s Privacy Sandbox proposals to ensure they do not distort competition. The CMA is now launching a consultation on whether to accept Google’s commitments. If accepted, the commitments would be legally binding.
This is related to CMA’s enforcement action against Google, the lack of support for FloC, and likely responsible for Googles recent decision to extend the life of cookies for two years. Kaye mentions the W3C, where there are ongoing discussions about privacy and what should be in a Privacy Sandbox, but it is not clear what CMA’s role would be —”referee” seems aspirational. It should be noted that Google is a master at creating industry initiatives and using their resources to influence or control development outcomes. FloC, AMP, Core Vitals, and schema.org structured data are all examples. (For deep dives see Michael Andrews’ Who benefits from schema.org?, and Scott Gilbertson on Core vitals and AMP).
A16z’s Sarah Wang and Martin Casado provide an eye-opening analysis of the short and long term costs of cloud computing. The near-term time-to-market and cost benefits of cloud are clear and well known, but over time they report
Across all our conversations with diverse practitioners, the pattern has been remarkably consistent: If you’re operating at scale, the cost of cloud can at least double your infrastructure bill. … the cost of cloud “takes over” at some point, locking up hundreds of billions of market cap that are now stuck in this paradox: You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it.
The infrastructure costs go up if you stay on the cloud, and the cost and difficulty of moving off the cloud increase over time. They do offer some advice — mine is to read their article.
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
This week we have articles by Tom Warren, Eric Seufert, Prukalpa Sankar, and news from Expert.ai, Contentsquare, Language I/O, LivePerson and Adobe, DataStax, and DataRobot.
We had a server glitch with two June issues causing some of you not to receive them. Here are links if you need them: June 2 and June 9.
Opinion / Analysis
Microsoft’s new Fluid office documents
Microsoft’s updated Whiteboard app. Image: Microsoft
The Fluid Framework was demonstrated at Build 2019, and at Build 2020 Fluid was previewed and made open source. Starting this summer we’ll see released versions of some Fluid functionality, including in the updated version of Microsoft Whiteboard shown above.
Fluid is interesting for two reasons. First, it’s a partial realization of the ambitious document computing models both Apple and Microsoft were building in the early 90s. Second, it has the potential for improving productivity in collaborative workplace, remote, and hybrid environments. But it is a big change, and how its various capabilities will actually be adopted and integrated into systems and workflows within Microsoft 365, in conjunction with other workplace tools, and with larger enterprise ecosystems, is TBD. The Verge’s Tom Warren has more on the current announcements.
Apple’s privacy controls are mostly a good thing for consumers and for Apple, less so for publishers, advertisers, and competitors. Unfortunately, Apple’s new Private Relay seems to have the effect of treating the open web more as a direct competitor to be weakened than a sometimes inconvenient public good to be supported. For all its problems an open web is a net good for everyone. It is unlikely that Apple wants anyone to think they have to choose between Apple privacy and the open web. Eric Seufert explains why Apple’s new Private Relay, in its current form, is something to be concerned about.
This is not just for beginners. Prukalpa Sankar has put together a really useful curated list of resources for anyone who needs to keep up with current data stack technologies and strategies.
The modern data stack is messy and complicated, and it’s changing every day. There’s tons of news about it, and it’s hard to separate the hype and noise from reality. Here’s how our team keeps in touch with the latest news and trends.
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
This week I suggest articles by Walid Saba and Joshua Benton and have news from Google, TeamViewer and SAP, Amplitude, Jorsek, Asana, and SpeechLive. (<2 min)
You’ll note that I continue to experiment a bit with the format. After I try out a few more things I’ll follow up with the long-promised survey, especially given Apple’s new Mail Privacy Protection. (See Joshua Benton’s article below.) In the meantime just reply to this email to let me know what you think.
Opinion / Analysis
Walid Saba
Ontology, knowledge graphs and NLU: three pillars of one and the same system
The application of enterprise knowledge graphs and natural language understanding and processing continue to grow, for good reason, but neither is easy and the combination even less so. In this short piece Walid Saba identifies a key problem yet argues that this combination, plus ontology is, in general, necessary for success. How to accomplish this? Well, he’s not the only one looking at this. The company he works for, Ontologik, is in stealth mode, but their site has links to an accessible presentation, and to more technical research.
Joshua Benton provides a good summary of last week’s announcements publishers big and small will care about. It’s a mixed bag, but changes to notification controls and the coming end of open rate statistics for newsletter publishers, like us, will not be pleased. The only thing we track is activity in our ad-free newsletter which provides important customer feedback.
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.
Scott Brinker has a clever article tying together his own thoughts on tech stacks and platforms, with Ben Thompson’s aggregation theory. His piece is not prescriptive, but suggestive. Looking at your own stack and operational workflows through this lens could surface useful insights on ways to improve both customer and employee experiences and operational efficiencies. If you’re a product manager, you (hopefully!) already understand quite a bit about the horizontal and vertical data flows, integrations, and relative control of other products in the ecosystem you live in, but I expect you’ll still find the article fruitful. The concept of aggregating time is especially rich.
With the web opening new frontiers in collaboration, the web’s native language of JavaScript is the best choice for exploring data and communicating insights.
Mike Bostock makes the case that JavaScript’s ubiquity, and related collaboration and communication capabilities, provide a key advantage over Python, R, and Julia, today’s languages of choice for data analysis. Given that most organizations struggle with integrating data analysis into decision-making across business applications, it’s an interesting idea. Bostock is not naive and discusses what JavaScript still lacks, but it’s still his first choice for its “portability and convenience”.
June 9, 2021 – RWS, provider of technology-enabled language, content management, and intellectual property services, announced the return of Language Weaver, a pioneering brand in automatic language translation. Language Weaver, which combines RWS’s linguistic expertise with SDL’s and Iconic’s technologies, will now represent RWS’s machine translation platform.
Founded in 2002 Language Weaver commercialized new approaches to automatic language translation based on machine learning. Acquired by SDL in 2010, the Language Weaver brand was retired in 2015 and renamed SDL Machine Translation. The technology, through continual investment, evolved from statistical machine translation to neural machine translation, capable of instantly translating content across 2,700 language combinations. RWS acquired SDL in 2020, and is now bringing back the Language Weaver brand.
Building machine translation (MT) models has traditionally required specialists. The Language Weaver platform allows anyone to provide real-time feedback on translations and fine-tune generic language models. Behind the scenes the platform also constantly looks for ways to improve the quality of translations. The technology benefits any business or industry dealing with large volumes of multilingual content. Language Weaver can be integrated with any software or platform, from Microsoft Office, to chatbots and eCommerce platforms.
June 8, 2021 – Snowflake unveiled new product innovations for the Data Cloud, including data programmability, global data governance, and platform optimizations.
Data programmability:
Snowpark. With initial support for Java and Scala, Snowflake’s developer experience, Snowpark, allows data engineers, data scientists, and developers to build using their preferred language and execute these within Snowflake.
Java UDFs. With Java user-defined-functions (UDFs), customers can bring their custom code and business logic to Snowflake.
Unstructured data. Snowflake’s unstructured data support enables customers to store, govern, process, and share file data alongside their structured and semi-structured data.
SQL API. The Snowflake SQL API enables applications to call Snowflake directly through a REST API.
Global governance:
Classification. Snowflake’s classification capability automatically detects personally identifiable information (PII) in a given table and leverages the tagging framework to annotate the data.
Anonymized views. This can be used to protect privacy and identity in a dataset.
Platform:
Improved Storage Economics. Better compression, and reduced storage costs.
Improved Support for Interactive Experiences. Updates released for high volume and low latency workload requirements improve query throughput on a single compute cluster.
Usage Dashboard. New usage dashboard helps customers better understand usage and costs across the platform.
GraphDB 9.8 brings text mining and Kafka connectivity
June 2, 2021 – Ontotext announced the realize of GraphDB 9.8, which offers text mining integration, notifications over Kafka, Helm charts, and performance improvements. The text mining plugin comes with out-of-the-box support for text analytic services such as Ontotext’s Tag API, GATE Cloud, and spaCy server, as well as an expressive mapping language, to register new services without coding. The extracted text annotations can be manipulated with SPARQL and either returned to the caller for further processing or stored directly into the repository where they will enrich the existing knowledge graph. This functionality covers a number of use-cases that rely on both RDF and text analytics.
The Kafka connector provides a means to synchronize changes to the RDF model to any downstream system via the Apache Kafka framework. Each Kafka connector instance will stay automatically up-to-date with the GraphDB repository data. The implementation is built on the same framework as the existing Elasticsearch, Solr and Lucene connectors and allows for precise mapping from RDF to JSON, such as defining fields based on property chains, nested document support as well as advanced filtering by type, literal language or a complex expression. GraphDB 9.8 comes with standard Helm charts and instructions that can help you get started with GraphDB Enterprise Edition on Kubernetes.
The latest version of the cloud-agnostic enterprise file sync, sharing and data governance platform, is faster, with increased efficiency and ability to handle higher loads. The system software applications have been upgraded for optimal performance on Apache, PHP, MongoDB and SOLR. The customer experience has also been enhanced, with a new, streamlined user interface along with the debut of a log-in wizard…
SNS unveils ShareBrowser plugin for DaVinci Resolve
Studio Network Solutions (SNS), provider of workflow storage servers for professional media teams, announced that their ShareBrowser media asset management (MAM) software is now available as a workflow integration plugin for DaVinci Resolve Studio video editing software by Blackmagic Design…
Kofax TotalAgility, the workflow orchestration engine within the company’s Intelligent Automation Platform, has been enhanced with 50 new low-code, document intelligence, process orchestration and connected systems capabilities…
The Gilbane Advisor is curated by Frank Gilbane for content technology, computing, and digital experience professionals. The focus is on strategic technologies. We publish recommended articles and content technology news weekly. We do not sell or share personal data.