Curated for content, computing, and digital experience professionsals

Tag: Natural language processing (NLP) (Page 2 of 2)

Lucidworks announces advanced linguistics package

Lucidworks announced the Advanced Linguistics Package for Lucidworks Fusion to power personalized search for users in Asian, European, and Middle Eastern markets. Lucidworks now embeds text analytics from Basis Technology, provider of AI for natural language processing. According to the companies, building, testing, and maintaining the many algorithms and models required to properly support each language is challenging and expensive. Asian, Middle Eastern, and certain European languages require additional processes to handle unique linguistic phenomena, such as lack of whitespace, compound words, and multiple forms of the same word. The combination of Basis with the AI-powered search platform of Lucidworks Fusion is expected to provide accuracy and performance enhancements in information retrieval for the digital experience. Lucidworks’ Advanced Linguistics Package provides language processing in more than 30 languages and advanced entity extraction in 21 languages. By accurately analyzing the text, in the language it was written, Rosette helps the Lucidworks Fusion platform deliver the right answers to every user, regardless of where they work or what language they use.

https://lucidworks.comhttps://www.basistech.com

Luminoso announces enhancements to open data semantic network

Luminoso, who turn unstructured text data into business-critical insights, announced the newest features of ConceptNet, an open data semantic network whose development is led by Luminoso Chief Science Officer Robyn Speer. ConceptNet originated from MIT Media Lab’s Open Mind Common Sense project more than two decades ago, and the semantic network is now used in AI applications around the world. ConceptNet is cited in more than 700 AI papers in Google Scholar, and its API is queried over 500,000 times per day from more than 1,000 unique IPs. Luminoso has incorporated ConceptNet into its proprietary natural language understanding technology, QuickLearn 2.0. ConceptNet 5.8 features:

Continuous deployment: ConceptNet is now set up with continuous integration using Jenkins and deployment using AWS Terraform, which will make it faster to deploy new versions of the semantic network and easier for others to set up mirrors of the API.

Additional curation of crowd-sourced data: ConceptNet’s developers have filtered entries from Wiktionary that were introducing hateful terminology to ConceptNet without its context. This is part of their ongoing effort to prevent human biases and prejudices from being built into language models. ConceptNet 5.8 has also updated its Wiktionary parser so that it can handle updated versions of the French and German-language Wiktionary projects.

HTTPS support: Developers can now reach ConceptNet’s website and API over HTTPS, improving data transfer security for applications using ConceptNet.

http://blog.conceptnet.io/posts/2020/conceptnet-58/, https://luminoso.com/how-it-works

Gilbane Advisor 3-4-20 — IKEA, T5, AI

IKEA sets a new privacy standard for marketers

Tim Walters reports on an impressive approach by IKEA to earn consumer’s trust, by doing rather than (just) promising. As Tim says, you should really watch the IKEA video description and demo with their delightfully down-to-earth Chief Digital Officer, Barbara Martin Coppola. Read More

Transfer Learning with T5: the Text-To-Text Transfer Transformer

Many of you are familiar with natural language processing (NLP) from the rule-based machine translation in the 80s to today’s more successful machine learning approaches. This post from the Google AI Blog describes a promising new Transfer Learning technique and openly available tools. Slightly technical with a link to the academic paper.

With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). Read More

The new business of AI (and how it’s different from traditional software)

Martin Casado and Matt Bornstein from Andreessen Horowitz wrote a thoughtful piece for AI startups and investors on the differences between the business models of AI companies and software companies. As investors they have a particular interest in the margin potential look at the resources and costs associated with each. My take is that is that they have identified a difference of degree rather than of kind, at least in the case of enterprise software applications, which have similar scaling, “humans in the loop”, interoperability, custom development, and support requirements. Large scale content management systems and “digital experience platforms” are examples. In any case, this is a good read, and all the authors’ recommendations should also be considered by traditional enterprise software companies :).  Read More

Update on technology transformations

McKinsey reports on enterprise’s view and appetite for continued technology transformation. Tldr; it’s hard but showing benefits, and competitiveness demands its continuation. Read More

Business side support IT in top companies

Also…

The Gilbane Advisor curates content for content technology, computing, and digital experience professionals. We focus on strategic technologies. We publish more or less twice a month except for August and December. We do not sell or share personal data.

SDL Tridion Integrates Q-go Natural Language Search into Web Content Management

SDL Tridion announced that it has partnered with Q-go to provide an integrated Natural Language Search engine within SDL Tridion’s web content management platform. The solution provides the online search environment within websites only targeted and relevant search results. Q-go’s Natural Language Search is now accessible from within the SDL Tridion web content management environment. Content editors are able to create model questions in the Q-go component of the SDL Tridion platform. This means that the most common questions pertaining to products and the website itself can be targeted and answered by web content editors, creating streamlined content and vastly increased relevance of searches. The integration also means that only one interface is needed to update the entire website, which can be done anywhere, anytime. You can find more information on the integration at the eXtensions Community of  http://www.sdltridionworld.com

Dewey Decimal Classification, Categorization, and NLP

I am surprised how often various content organizing mechanisms on the Web are compared to the Dewey Decimal System. As a former librarian, I am disheartened to be reminded how often students were lectured on the Dewey Decimal system, apparently to the exclusion of learning about subject categorization schemes. They complemented each other but that seems to be a secret among all but librarians.

I’ll try to share a clearer view of the model and explain why new systems of organizing content in enterprise search are quite different than the decimal model.

Classification is a good generic term for defining physical organizing systems. Unique animals and plants are distinguished by a single classification in the biological naming system. So too are books in a library. There are two principal classification systems for arranging books on the shelf in Western libraries: Dewey Decimal and Library of Congress (LC). They each use coding (numeric for Dewey decimal and alpha-numeric for Library of Congress) to establish where a book belongs logically on a shelf, relative to other books in the collection, according to the book’s most prominent content topic. A book on nutrition for better health might be given a classification number for some aspect of nutrition or one for a health topic, but a human being has to make a judgment which topic the book is most “about” because the book can only live in one section of the collection. It is probably worth mentioning that the Dewey and LC systems are both hierarchical but with different priorities. (e.g. Dewey puts broad topics like Religion and Philosophy and Psychology at top levels and LC puts those two topics together while including more scientific and technical topics at the top of the list, like Agriculture and Military Science.)

So why classify books to reside in topic order? It requires a lot of labor to move the collections around to make space for new books. It is for the benefit of the users, to enable “browsing” through the collection, although it may be hard to accept that the term browsing was a staple of library science decades before the internet. Library leaders established eons ago the need for a system of physical organization to help readers peruse the book collection by topic, leading from the general to the specific.

You might ask what kind of help that was for finding the book on nutrition that was classified under “health science.” This is where another system, largely hidden from the public or often made annoyingly inaccessible, comes in. It is a system of categorization in which any content, book or otherwise, can be assigned an unlimited number of categories. Wondering through the stacks, one would never suspect this secret way of finding a nugget in a book about your favorite hobby if that book was classified to live elsewhere. The standard lists of terms for further describing books by multiple headings are called “subject headings” and you had to use a library catalog to find them. Unfortunately, they contain mysterious conventions called “sub-divisions,” designed to pre-coordinate any topic with other generic topics (e.g. Handbooks, etc. and United States). Today we would call these generic subdivision terms, facets. One reflects a kind of book and the other reveals a geographical scope covered by the book.

With the marvel of the Web page, hyperlinking, and “clicking through” hierarchical lists of topics we can click a mouse to narrow a search for handbooks on nutrition in the United States for better health beginning at any facet or topic and still come up with the book that meets all four criteria. We no longer have to be constrained by the Dewey model of browsing the physical location of our favorite topics, probably missing a lot of good stuff. But then we never did. The subject card catalog gave us a tool for finding more than we would by classification code alone. But even that was a lot more tedious than navigating easily through a hierarchy of subject headings, narrowing the results by facets on a browser tab and further narrowing the results by yet another topical term until we find just the right piece of content.

Taking the next leap we have natural language processing (NLP) that will answer the question, “Where do I find handbooks on nutrition in the United States for better health?” And that is the Holy Grail for search technology – and a long way from Mr. Dewey’s idea for browsing the collection.

Newer posts »