The Gilbane Advisor

Curated for content, computing, and digital experience professionals

Page 9 of 920

Databricks to acquire Tabular

Databricks, a Data and AI company, announced it has agreed to acquire Tabular, a data management company founded by Ryan Blue, Daniel Weeks, and Jason Reid. By bringing together the original creators of Apache Iceberg and Linux Foundation Delta Lake, the two leading open source lakehouse formats, organizations are no longer limited by which of these formats their data is in. Databricks intends to work closely with the Delta Lake and Iceberg communities to bring format compatibility to the lakehouse; in the short term, inside Delta Lake UniForm and in the long term, by evolving toward a single, open, and common standard of interoperability. Databricks and Tabular will work together towards a joint vision of the open lakehouse.

Databricks will work with the Delta Lake and Iceberg communities to bring data interoperability to the formats over time. This is a long journey, one that will likely take several years to achieve in those communities. That is why last year, Databricks introduced Delta Lake UniForm. UniForm tables provide interoperability across Delta Lake, Iceberg, and Hudi, and support the Iceberg restful catalog interface so companies can use the analytics engines and tools they are already familiar with.

https://www.databricks.com/company/newsroom/press-releases/databricks-agrees-acquire-tabular-company-founded-original-creators

Ontotext announces Metadata Studio 3.8

Ontotext, a provider of enterprise knowledge graph and semantic database engines, announced the latest version of Ontotext Metadata Studio (OMDS), a tool designed for knowledge graph enrichment through text analytics of unstructured documents. Version 3.8 aids in the creation, evaluation, and quality improvement of text analytics services. With more intuitive and effective search solution capabilities, enhancement to OMDS removes the difficulties users face when exposing semantic search over their documents, especially when they are working with their own, custom reference domain models. Updates include:

  • Enhanced Domain Model Search Interface transforms the reference annotation schema into a user-friendly search interface, allowing exploration and retrieval of content based on the preferred domain data model.
  • Knowledge Graph Enrichment and Extension enables users to reuse their domain models so they can be leveraged for advanced analytics and quality management.
  • Advanced Search Capabilities supports all types of searches. The solution allows users to conduct simple searches such as identifying documents containing specific text as well as complex queries that filter documents based on the presence or absence of certain text and combinations of metadata objects and property values.
  • Improved Usability and Workflow Efficiency enables users to organize content effortlessly by moving documents between corpora or deleting them from the database.

https://www.ontotext.com/products/ontotext-metadata-studio/

Perplexity introduces Perplexity Pages

Snippets from the Perplexity blog…

You’ve used Perplexity to search for answers, explore new topics, and expand your knowledge. Now, it’s time to share what you learned. Meet Perplexity Pages, your new tool for easily transforming research into visually stunning, comprehensive content. Whether you’re crafting in-depth articles, detailed reports, or informative guides, Pages streamlines the process so you can focus on sharing your knowledge with the world.

Pages lets you effortlessly create, organize, and share information. Search any topic, and instantly receive a well-structured, beautifully formatted article. Publish your work to our growing library of user-generated content and share it directly with your audience with a single click. What sets Perplexity Pages apart?

  • Customizable: Tailor the tone of your Page to resonate with your target audience, whether you’re writing for general readers or subject matter experts.
  • Adaptable: Easily modify the structure of your article—add, rearrange, or remove sections to best suit your material and engage your readers.
  • Visual: Elevate your articles with visuals generated by Pages, uploaded from your personal collection, or sourced online.

Pages is rolling out to users now. Log in to your Perplexity account and select “Create a Page” in the library tab.

https://www.perplexity.ai/page/new

Sinequa releases new generative AI assistants

Sinequa announced the availability of Sinequa Assistants; enterprise generative AI assistants that integrate with enterprise content and applications to augment and transform knowledge work. Sinequa’s Neural Search complements GenAI and provides the foundation for Sinequa’s Assistants. Its capabilities go beyond RAG’s conventional search-and-summarize paradigm to intelligently execute complex, multi-step activities, all grounded in facts to augment the way employees work.

Sinequa’s Assistants leverage all company content and knowledge to generate contextually-relevant insights and recommendations. Optimized for scale with three custom-trained small language models (SLMs), Sinequa Assistants help ensure accurate conversational responses on any internal topic, complete with citations and traceability to the original source.

Sinequa Assistants work with any public or private generative LLM, including Cohere, OpenAI, Google Gemini, Microsoft Azure Open AI, and Mistral. The Sinequa Assistant framework includes ready-to-go Assistants along with tools to define custom Assistant workflows so that customers can use an Assistant out of the box, or tailor and manage multiple Assistants from a single platform. These Assistants can be tailored to fit the needs of specific business scenarios and deployed and updated quickly without code or additional infrastructure. Domain-specific assistants scientists, engineers, lawyers, financial asset managers and others are available.

https://www.sinequa.com/company/press/sinequa-augments-companies-with-release-of-new-generative-ai-assistants

Siteimprove launches new product features

Siteimprove, a platform to help brands stand out with accessible, high-performing digital content experiences, launched new capabilities designed to turn large amounts of data into actionable, easy to understand insights, increase cross-organizational collaboration, facilitate confident decision making, and deliver tangible outcomes. Today’s launch includes four initiatives:

  • Top Paths – To identify high-performing content
    • With Top Paths, marketers can understand the impact of their content on conversion metrics to focus on what moves the needle.
  • Visitor Engagement Score – the Digital Certainty Index (DCI)score of engagement
    • 95 percent of website visits are non-converting, but that doesn’t mean they fail to deliver value. With Visitor Engagement Score, marketers can now measure visitor engagement with their content beyond conversions to better understand the full impact of their organizations’  across the customer journey.
  • No-Code Event Tracking – event configuration without the hassle
    • With No-Code Event Tracking, marketers can now set up events quickly, with full transparency without technical expertise.
  • Sites Progress – tell a convincing and accurate story of the progress across your entire website
    • With Sites Progress, digital marketing teams can now understand and communicate the progress the teams are making, across all the sites, by consolidating data in a single, easy-to-understand view.

https://www.siteimprove.com/hello/new-and-siteimproved-2024-q2-product-release

Tonic.ai launches secure unstructured data lakehouse for LLMs

Tonic.ai launched a secure data lakehouse for LLMs, Tonic Textual, to enable AI developers to securely leverage unstructured data for retrieval-augmented generation (RAG) systems and large language model (LLM) fine-tuning. Tonic Textual is a data platform designed to eliminate integration and privacy challenges ahead of RAG ingestion or LLM training bottlenecks. Leveraging its expertise in data management and realistic synthesis, Tonic.ai has developed a solution to tame and protect siloed, messy, and complex unstructured data into AI-ready formats ahead of embedding, fine-tuning, or vector database ingestion. With Tonic Textual: 

  1. Build, schedule, and automate unstructured data pipelines that extract and transform data into a standardized format convenient for embedding, ingesting into a vector database, or pre-training and fine-tuning LLMs. Textual supports TXT, PDF, CSV, TIFF, JPG, PNG, JSON, DOCX and XLSX out-of-the-box.
  2. Detect, classify, and redact sensitive information in unstructured data, and re-seed redactions with synthetic data to maintain the semantic meaning. Textual leverages proprietary named entity recognition (NER) models trained on a diverse data set spanning domains, formats, and contexts to ensure sensitive data is identified and protected.
  3. Enrich your vector database with document metadata and contextual entity tags to improve retrieval speed and context relevance in RAG systems.

https://www.tonic.ai/textual

Gilbane Advisor 5-22-24 — Text + KG embeddings, floppies!

This week we feature articles from Sunila Gollapudi, and Leontien Talboom & Chris Knowles.

Additional reading comes from Heather Hedden, Cassie Kozyrkov, and Jim Clyde Monge.

News comes from Elastic, DataStax, Flatfile, and Foxit & Straker Translations.

Note: We’ll be off next week, back on June 5th.

All previous issues are available at https://gilbane.com/gilbane-advisor-index


Opinion / Analysis

Combine text embeddings and knowledge (graph) embeddings in RAG systems

Sunila Gollapudi provides a good introduction and how-to suitable for technical and not so technical readers.

“In this article, I am excited to present my experiments combining Text Embeddings and Knowledge (Graph) Embeddings and observations on RAG performance. I will start by explaining the concept of Text and Knowledge Embeddings independently, using simple open frameworks, then, we will see how to use both in RAG applications.” (15 min)

https://towardsdatascience.com/combine-text-embeddings-and-knowledge-graph-embeddings-in-rag-systems-5e6d7e493925

Raw flux streams and obscure formats: Further work around imaging 5.25-inch floppy disks

I’m sure the subject has some of you shaking your heads for any number of reasons. But for those connected with digital preservation efforts, this case-study/lessons-learned piece from Leontien Talboom & Chris Knowles at Cambridge University could be very helpful. Some of the comments may also be useful. The just-curious may be shocked at the complexity involved. (8 min)

https://digitalpreservation-blog.lib.cam.ac.uk/raw-flux-streams-and-obscure-formats-further-work-around-imaging-5-25-inch-floppy-disks-5a2cf2e5f0d1

More Reading

All Gilbane Advisor issues


Content technology news

DataStax launches new Hyper-Converged Data Platform

Brings OpenSearch and Apache Pulsar to HCD Platform; DataStax Enterprise 6.9 enables self-managed data workloads for GenAI.
https://www.datastax.com/press-release/datastax-launches-new-hyper-converged-data-platform-giving-enterprises-the-complete-modern-data-center-suite-ceeded-for-ai-in-production

Architecture optimized for real-time, low-latency applications including search, retrieval augmented generation (RAG), observability & security.
https://ir.elastic.co/news/news-details/2024/Elastic-Announces-First-of-its-kind-Search-AI-Lake-to-Scale-Low-Latency-Search/default.aspx

Flatfile unveils new AI-powered data transformation features

Data transformation and data migration capabilities for business users, data analysts, systems integration teams, and enterprise developers.
https://flatfile.com/news/flatfile-unveils-ai-powered-data-transformation/

Foxit partners with Straker Translations

The collaboration adds translation capabilities to Foxit’s eSignature services, enabling users to translate and sign documents in multiple languages.
https://www.foxit.com ■ https://www.straker.ai

All content technology news


The Gilbane Advisor is authored by Frank Gilbane and is ad-free, cost-free, and curated for content, computing, web, data, and digital experience technology and information professionals. We publish recommended articles and content technology news most Wednesdays. We do not sell or share personal data.

Subscribe | View online | Editorial policy | Privacy policy | Contact

Elastic announced Search AI Lake to scale low latency search

Elastic, a Search AI company, today announced Search AI Lake, a cloud-native architecture optimized for real-time, low-latency applications including search, retrieval augmented generation (RAG), observability and security. The Search AI Lake also powers the new Elastic Cloud Serverless offering. All operations, from monitoring and backup to configuration and sizing, are managed by Elastic – users just bring their data and choose Elasticsearch, Elastic Observability, or Elastic Security on Serverless. Benefits include:

  • Fully decoupling storage and compute enables scalability and reliability using object storage, dynamic caching supports high throughput, frequent updates, and interactive querying of large data volumes.
  • Multiple enhancements maintain query performance even when the data is safely persisted on object stores.
  • By separating indexing and search at a low level, the platform can automatically scale to meet the needs of a wide range of workloads.
  • Users can leverage a native suite of AI relevance, retrieval, and reranking capabilities, including a native vector database integrated into Lucene, open inference APIs, semantic search, and first- and third-party transformer models, which work with the array of search functionalities.
  • Elasticsearch’s query language, ES|QL, is built in to transform, enrich, and simplify investigations with fast concurrent processing irrespective of data source and structure.

https://ir.elastic.co/news/news-details/2024/Elastic-Announces-First-of-its-kind-Search-AI-Lake-to-Scale-Low-Latency-Search/default.aspx

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑