Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 17 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Dstl releases free Baleen 3 data processing update

The Defence Science and Technology Laboratory (Dstl) has released a new free version of its popular data processing tool. Baleen 3 is a tool for building data processing pipelines using the open source Annot8 framework and succeeds Baleen 2, one of the first open source projects by Dstl, the science inside UK defence and security. It offers users the ability to search, process and collate data, and is suitable for personal and commercial applications. It has been used across government, and by industry and academia, and also internationally as well as in the UK.

The tool enables the creation of a bespoke chain of “processors” to extract information from unstructured data (e.g. text documents, images). For example, Baleen 3 could process a folder with thousands of Word Documents and PDFs in it to extract all e-mail addresses and phone numbers in those documents and store them in a database. As well as text, Baleen 3 can also find and extract images within those documents, perform OCR to find text within those images, translate that text into English, and then run machine learning models to find mentions of People within those images. Baleen 3 supports components developed within the Annot8 framework, and as a result it is easy to extend and develop further to cover new use cases and provide additional functionality. There are already a large number of components available for use within the Annot8 framework, including some previously developed by Dstl.

Following the release of Baleen 3, support for the existing Baleen 2 project will be withdrawn. Dstl is encouraging all users to move to using Baleen 3 where possible. Baleen 3 is built on top of newer technologies, and will be easier to maintain and deploy as a result of the upgrade. It also extends Baleen 2’s focus on text to support other forms of unstructured data, such as images. Baleen 3 is available to download now.

https://github.com/dstl/baleen

Yext releases “Milky Way” search algorithm with BERT

Yext, Inc. announced “Milky Way,” the latest upgrade to the natural language processing (NLP) algorithm that powers Yext Answers, Yext’s site search product. Headlining this milestone update is the adoption of BERT, (Bidirectional Encoder Representations from Transformers). Developed by Google, BERT is an open source machine learning framework for NLP designed to better understand user searches. By leveraging BERT within Named Entity Recognition (a process to locate and classify named entities mentioned in unstructured text into predefined categories), Yext Answers improves its ability to distinguish locations from other types of entities, including people, jobs, and events. The update includes:

  • Improved Named Entity Recognition: By leveraging BERT, Yext Answers can now better understand the contextual relationship between search terms. Answers will return a more relevant result by taking into account the correct classification, whether a location, person or product.
  • Improved Location Detection: The update leaves behind location biasing. Now, Yext Answers will filter through locations stored by a business in their Yext knowledge graph to surface the best match.
  • Updated Healthcare Taxonomy: More than 3,000 new healthcare-related synonyms, conditions, treatments, and procedures have been added to the algorithm’s taxonomy.
  • Improved Stemming and Typo Tolerance.

https://www.yext.com/resources/about/news-media/2020-08-yext-releases-milky-way/

Google open-sources LIT for evaluating natural language models

Google-affiliated researchers released the Language Interpretability Tool (LIT), an open source, framework-agnostic platform and API for visualizing, understanding, and auditing natural language processing models. It focuses on questions about AI model behavior, like why models made certain predictions and why they’re performing poorly with input corpora. LIT incorporates aggregate analysis into a browser-based interface that’s designed to enable explorations of text generation behavior. The tool set is architected so that users can hop between visualizations and analysis to test hypotheses and validate those hypotheses over a data set. New data points can be added on the fly and their effect on the model visualized immediately, while side-by-side comparison allows for two models or two data points to be visualized simultaneously. And LIT calculates and displays metrics for entire data sets to spotlight patterns in model performance, including the current selection, manually generated subsets, and automatically generated subsets.

LIT works with any model that can run from Python, the Google researchers say, including TensorFlow, PyTorch, and remote models on a server. And it has a low barrier to entry, with only a small amount of code needed to add models and data. The team cautions that LIT doesn’t scale well to large corpora and that it’s not “directly” useful for training-time model monitoring. But they say that in the near future, the tool set will gain features like counterfactual generation plugins, additional metrics and visualizations for sequence and structured output types, and a greater ability to customize the UI for different applications.

H/T VentureBeat: https://venturebeat.com/2020/08/14/google-open-sources-lit-a-toolset-for-evaluating-natural-language-models/

Zignal Labs adds Lexalytics to provide natural language processing to platform

Lexalytics announced that Zignal Labs, creator of the Impact Intelligence platform for measuring the evolution of opinion in real time, has added Lexalytics Salience engine to extend its platform’s natural language processing (NLP) and text analytics capabilities to help marketers, communicators and analysts gain a greater understanding of perceptions across traditional and social media. With Lexalytics, Zignal’s customers across industries can understand what people are saying about products, services or current events, categorize discussions into separate groupings and themes, and evaluate the sentiment of media coverage across multiple languages.

http://www.lexalytics.com, http://www.zignallabs.com

Neofonie announced TXTWerk – text mining for SAP solutions

Neofonie announced that TXTWerk – Text mining for SAP solutions, a framework application is now available for trial and online purchase on SAP App Center, the digital marketplace for SAP partner offerings. TXTWerk is delivered online as a subscription service and integrates with SAP and third-party software through the API management capabilities of SAP Cloud Platform Integration Suite. TXTWerk enables the extraction of metadata from texts, providing structured data from unstructured texts. By applying machine learning techniques in combination with rule-based approaches, TXTWerk can read and understand texts quickly. Whether 1,000 or 10 billion documents need to be processed, TXTWerk recognizes the most important keywords, people, places, organizations, events and key concepts and links them to sources such as knowledge graphs or internal company data. Also, part of the framework are artificial intelligence (AI) processes for classification in classes defined by the customer, a sentiment analysis of texts, phrase and role recognition as well as the automatic linking of entities according to specially defined relations. In addition to the AI processes, TXTWerk comes with a knowledge graph with over seven million entries.

https://www.neofonie.de/english, https://www.sapappcenter.com/en/product/display-0000059151_live_v1

Luminoso introduces deep learning model for evaluating sentiment at concept level

Luminoso’s new deep learning model understands documents using multiple layers of attention, a mechanism that identifies which words are relevant to get context around a specific concept as expressed by a word or phrase. This model is capable of identifying the author’s sentiment for each individual concept they’ve written about, as opposed to providing an analysis of the overall sentiment of the document.

Using Concept-Level Sentiment, users will be able to:

  • Effectively analyze mixed feedback — Concept-level sentiment analysis is critical for capturing and understanding the voice of the customer (VoC). For example, product reviews rarely contain just one type of feedback, and it’s important to tease apart the good from the bad. Getting a polarity for each of the topics in an open-ended survey response is critical for understanding what works and what doesn’t for your customers.
  • Quickly surface buried feedback — Uncovering negative comments in overwhelmingly positive open-ended survey responses is critical for better understanding customers and employees. For instance, in voice of the employee (VoE) surveys, employee feedback can be overwhelmingly positive and delivered in an upbeat way in an effort to soften criticisms. Concept-Level Sentiment in Luminoso enables users to quickly identify and understand “buried” feedback, such as negative points in an overwhelmingly positive HR survey.
  • Intuitively aggregate concept sentiment across an entire dataset — For instance, after responses to a mobile app market research survey are loaded into Luminoso Daylight, a user can get a distribution of positive, negative, and neutral opinions about every aspect of the mobile experience across all of its mentions in the dataset.
  • Analyze customer and employee feedback across multiple languages — Global organizations often receive customer and employee feedback in multiple languages. With Luminoso, users can analyze the sentiment of concepts, natively in 15 languages.

https://luminoso.com/solutions/concept-level-sentiment

Information extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.

Cortical.io introduces contract intelligence solution with semantic search

Cortical.io announced a new release of Cortical.io Contract Intelligence, an AI-based solution for enterprises that need to review and manage a large corpus of contracts and other legal documents. Cortical.io Contract Intelligence utilizes its natural language understanding (NLU) to automatically extract, classify and analyze relevant information in documents. Cortical.io Contract Intelligence 4.0 now incorporates semantic search to enable the search of an entire database or individual documents.

Other features in Cortical.io Contract Intelligence 4.0 are integration with other Business Intelligence solutions and workflow improvements. These include:

  • A dashboard to enable specialists to manage and track the review progress.
  • Task assignment for specialists to assign documents and annotation or review tasks to individual subject matter experts.
  • Built-in OCR capabilities to detect scanned pdf files and convert them into machine-readable files capable of being annotated.
  • Sophisticated table extraction to parse and extract information from tables regardless of the row/column format in the PDF document.
  • Active assistance that includes inline messages pop up to guide users when creating new annotations.

Cortical.io Contract Intelligence is available immediately as a stand-alone application or can be integrated into a workflow through the use of REST APIs. The product can be delivered on-premises, in a private cloud or a public cloud. Cortical.io also is partnering with integration partners and other product solution vendors. The solution is licensed on an annual basis and includes maintenance and support.

https://www.cortical.io

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑