Curated content for content, computing, and digital experience professionsals

Category: Semantic Technologies (Page 1 of 6)

Luminoso announces enhancements to open data semantic network

Luminoso, who turn unstructured text data into business-critical insights, announced the newest features of ConceptNet, an open data semantic network whose development is led by Luminoso Chief Science Officer Robyn Speer. ConceptNet originated from MIT Media Lab’s Open Mind Common Sense project more than two decades ago, and the semantic network is now used in AI applications around the world. ConceptNet is cited in more than 700 AI papers in Google Scholar, and its API is queried over 500,000 times per day from more than 1,000 unique IPs. Luminoso has incorporated ConceptNet into its proprietary natural language understanding technology, QuickLearn 2.0. ConceptNet 5.8 features:

Continuous deployment: ConceptNet is now set up with continuous integration using Jenkins and deployment using AWS Terraform, which will make it faster to deploy new versions of the semantic network and easier for others to set up mirrors of the API.

Additional curation of crowd-sourced data: ConceptNet’s developers have filtered entries from Wiktionary that were introducing hateful terminology to ConceptNet without its context. This is part of their ongoing effort to prevent human biases and prejudices from being built into language models. ConceptNet 5.8 has also updated its Wiktionary parser so that it can handle updated versions of the French and German-language Wiktionary projects.

HTTPS support: Developers can now reach ConceptNet’s website and API over HTTPS, improving data transfer security for applications using ConceptNet.

http://blog.conceptnet.io/posts/2020/conceptnet-58/, https://luminoso.com/how-it-works

Speaker Spotlight: John Felahi – Making content findable

In another installment of Speaker Spotlight, we posed a couple of our frequently asked questions to speaker John Felahi, Chief Strategy Officer at Content Analyst Company, LLC. We’ve included his answers here. Be sure to see additional Speaker Spotlights from our upcoming conference.

John_Felahi-horiz

Speaker Spotlight: John Felahi

Chief Strategy Officer

Content Analyst Company, LLC

 

What is the best overall strategy for delivering content to web, multiple mobile, and upcoming digital channels? What is the biggest challenge? Development and maintenance cost? Content control? Brand management? Technology expertise?

One of the biggest challenges to delivering content to the web is making it as findable as possible to potential interested viewers.  While traditional, manual tagging and keyword search methods may have gotten us this far, and may be good enough for some use cases, they’re still not without limitations. The good news is, there are far more advanced, sophisticated – and automated – technologies available to remedy the numerous limitations of manual tagging content and keyword-based search. The limitations of manual tagging and keyword-based include:

  • Term creep – New terms constantly emerge, requiring taxonomies to be constantly updated.
  • Polysemy – Take Apple, for example. Is your user searching for the company, the Beatles’ record label, or the fruit?
  • Acronyms – Texting has introduced an entirely new language of acronyms (LOL, TTYL, WDYT).  Manually tagging content requires the editor to consider possible acronyms the users will be searching for.
  • Abbreviations – Tagging content with long, scientific terms, geographies, etc. require editors to factor these in along with the long terms they represent.
  • Misspellings – Thanks to spellcheck and autocorrect, technology has become much more forgiving for those who never made it past the first round eliminations in their sixth grade spelling bee. Content search, unfortunately, needs to be equally accommodating, if you want your users to find your content – which means tagging it with common misspellings.
  • Language – The web has certainly made the world a much smaller place, but that doesn’t mean everyone speaks English.  Making content findable in any language means it has to also be tagged in multiple languages.

On to the good news – there’s technology that’s been used for years in eDiscovery and the US Intelligence Community to overcome these very challenges, but for different reasons. Because the bad guys aren’t tagging their content to make it more findable, the intel community needs a better way to find what they’re looking for. And in eDiscovery, finding relevant content can make a multi-million dollar difference to the outcome of a particular litigation or other regulatory matter. That’s why tens of thousands of legal reviewers and countless analysts in the intel community use a technology known as concept-aware advanced analytics.

How concept-aware advanced analytics differs from manual tagging and keyword search

As its name implies, concept-aware understands the underlying concepts within the content. As such, it can tag content automatically.  On the viewer’s side, content can be found by simply saying, “find more like this.” Categories are defined by taking examples that represent the concepts of a category. The system “learns” what that category is all about, and can then identify conceptually similar content and apply the same category. The process is the same on the search side. The user points to a piece of content and says, “find more like this.” Or as the content publisher, you present the viewer with conceptually similar content, i.e., “you may also be interested in these articles.”

While concept-aware advanced analytics doesn’t necessarily replace manual tagging and keyword search – which work very well in certain situations – the technology clearly overcomes many of the limitations of traditional tagging and search methods.

Catch Up with John at Gilbane

Track E: Content, Collaboration, and the Employee Experience

E7: Strategic Imperatives for Enterprise Search to Succeed
Wednesday, December, 4: 2:00 p.m. – 3:20 p.m.

[button link=”http://gilbaneconference.com/program” variation=”red”]Complete Program[/button] [button link=”http://gilbaneconference.com/schedule” variation=”red”]Conference Schedule[/button] [button link=”http://gilbaneconference.com/registration” variation=”red”]Register Today[/button]

Integrated Dynamic Schema.org Support in Webnodes CMS v3.7

Webnodes has announced CMS to have dynamic support for Schema.org. The new feature has an intuitive vocabulary mapping user interface as well as a code API and Asp.Net controls to streamline the work for site developers. The Webnodes CMS ontology management user interface provides a separation between data, data model and presentation layout. Schema.org which is all about making search engines understand the meaning of your content is a natural extension to the semantic core engine.  http://www.webnodes.com

TEMIS Releases Luxid 6

TEMIS, the provider of Text Analytics Solutions for the Enterprise, today announced the launch of the next generation of Luxid, its flagship semantic content enrichment solution. Luxid 6 is a semantic tagging platform which automatically extracts relevant information (entities, topics, events, sentiments), identifies relationships residing in unstructured data and facilitates links between similar and related documents. Luxid 6 optimizes the management of Enterprise content through the capture and structuring of targeted information. The software also enhances the utilization of content within an Enterprise’s workflows such as competitive intelligence, research and innovation, voice of the consumer and reputation management. http://www.temis.com/

How Far Does Semantic Software Really Go?

A discussion that began with a graduate scholar at George Washington University in November, 2010 about semantic software technologies prompted him to follow up with some questions for clarification from me. With his permission, I am sharing three questions from Evan Faber and the gist of my comments to him. At the heart of the conversation we all need to keep having is, how far does this technology go and does it really bring us any gains in retrieving information?

1. Have AI or semantic software demonstrated any capability to ask new and interesting questions about the relationships among information that they process?

In several recent presentations and the Gilbane Group study on Semantic Software Technologies, I share a simple diagram of the nominal setup for the relationship of content to search and the semantic core, namely a set of terminology rules or terminology with relationships. Semantic search operates best when it focuses on a topical domain of knowledge. The language that defines that domain may range from simple to complex, broad or narrow, deep or shallow. The language may be applied to the task of semantic search from a taxonomy (usually shallow and simple), a set of language rules (numbering thousands to millions) or from an ontology of concepts to a semantic net with millions of terms and relationships among concepts.

The question Evan asks is a good one with a simple answer, “Not without configuration.” The configuration needs human work in two regions:

  • Management of the linguistic rules or ontology
  • Design of search engine indexing and retrieval mechanisms

When a semantic search engine indexes content for natural language retrieval, it looks to the rules or semantic nets to find concepts that match those in the content. When it finds concepts in the content with no equivalent language in the semantic net, it must find a way to understand where the concepts belong in the ontological framework. This discovery process for clarification, disambiguation, contextual relevance, perspective, meaning or tone is best accompanied with an interface making it easy for a human curator or editor to update or expand the ontology. A subject matter expert is required for specialized topics. Through a process of automated indexing that both categorizes and exposes problem areas, the semantic engine becomes a search engine and a questioning engine.

The entire process is highly iterative. In a sense, the software is asking the questions: “What is this?”, “How does it relate to the things we already know about?”, “How is the language being used in this context?” and so on.

2. In other words, once they [the software] have established relationships among data, can they use that finding to proceed – without human intervention- to seek new relationships?

Yes, in the manner described for the previous question. It is important to recognize that the original set of rules, ontologies, or semantic nets that are being applied were crafted by human beings with subject matter expertise. It is unrealistic to think that any team of experts would be able to know or anticipate every use of the human language to codify it in advance for total accuracy. The term AI is, for this reason, a misnomer because the algorithms are not thinking; they are only looking up “known-knowns” and applying them. The art of the software is in recognizing when something cannot be discerned or clearly understood; then the concept (in context) is presented for the expert to “teach” the software what to do with the information.

State-of-the-art software will have a back-end process for enabling implementer/administrators to use the results of search (direct commentary from users or indirectly by analyzing search logs) to discover where language has been misunderstood as evidenced by invalid results. Over time, more passes to update linguistic definitions, grammar rules, and concept relationships will continue to refine and improve the accuracy and comprehensiveness of search results.

3. It occurs to me that the key value added of semantic technologies to decision-making is their capacity to link sources by context and meaning, which increases situational awareness and decision space. But can they probe further on their own?

Good point on the value and in a sense, yes, they can. Through extensive algorithmic operations, instructions can be embedded (and probably are for high-value situations like intelligence work), instructing the software what to do with newly discovered concepts. Instructions might then place these new discoveries into categories of relevance, importance, or associations. It would not be unreasonable to then pass documents with confounding information off to other semantic tools for further examination. Again, without human analysis along the continuum and at the end point, no certainty about the validity of the software’s decision-making can be asserted.

I can hypothesize a case in which a corpus of content contains random documents in foreign languages. From my research, I know that some of the semantic packages have semantic nets in multiple languages. If the corpus contains material in English, French, German and Arabic, these materials might be sorted and routed off to four different software applications. Each batch would be subject to further linguistic analysis, followed by indexing with some middleware applied to the returned results for normalization, and final consolidation into a unified index. Does this exist in the real world now? Probably there are variants but it would take more research to find the cases, and they may be subject to restrictions that would require the correct clearances.

Discussions with experts who have actually employed enterprise specific semantic software, underscores the need for subject expertise, and some computational linguistics training coupled with an aptitude for creative inquiry. These scientists informed me that individuals, who are highly multi-disciplinary and facile with electronic games and tools, did the best job of interacting with the software and getting excellent results. Tuning and configuration over time by the right human players is still a fundamental requirement.

Sophia Launches Sophia Search for Intelligent Enterprise Search and Contextual Discovery

Sophia, the provider of contextually aware enterprise search solutions, announced Sophia Search, a new search solution which uses a Semiotic-based linguistic model to identify intrinsic terms, phrases and relationships within unstructured content so that it can be recovered, consolidated and leveraged. Use of Sophia Search is designed to minimize compliance risk and reduce the cost of storing and managing enterprise information. Sophia Search is able to deliver a “three-dimensional” solution to discover, consolidate and optimize enterprise data, regardless of its data type or domain. Sophia Search helps organizations manage and analyze critical information by discovering the themes and intrinsic relationships behind their information, without taxonomies or ontologies, so that more relevant information may be discovered. By identifying both duplicates and near duplicates, Sophia Search allows organizations to effectively consolidate information and minimizing storage and management costs. Sophia Search features a patented Contextual Discovery Engine (CDE) which is based on the linguistic model of Semiotics, the science behind how humans understand the meaning of information in context. Sophia Search is available now to both customers and partners. Pricing starts at $30,000. http://www.sophiasearch.com/

Leveraging Two Decades of Computational Linguistics for Semantic Search

Over the past three months I have had the pleasure of speaking with Kathleen Dahlgren, founder of Cognition, several times. I first learned about Cognition at the Boston Infonortics Search Engines meeting in 2009. That introduction led me to a closer look several months later when researching auto-categorization software. I was impressed with the comprehensive English language semantic net they had doggedly built over a 20+ year period.

A semantic net is a map of language that explicitly defines the many relationships among words and phrases. It might be very simple to illustrate something as fundamental as a small geographical locale and all named entities within it, or as complex as the entire base language of English with every concept mapped to illustrate all the ways that any one term is related to other terms, as illustrated in this tiny subset. Dr. Dahlgren and her team are among the few companies that have created a comprehensive semantic net for English.

In 2003, Dr. Dahlgren established Cognition as a software company to commercialize its semantic net, designing software to apply it to semantic search applications. As the Gilbane Group launched its new research on Semantic Software Technologies, Cognition signed on as a study co-sponsor and we engaged in several discussions with them that rounded out their history in this new marketplace. It was illustrative of pioneering in any new software domain.

Early adopters are key contributors to any software development. It is notable that Cognition has attracted experts in fields as diverse as medical research, legal e-discovery and Web semantic search. This gives the company valuable feedback for their commercial development. In any highly technical discipline, it is challenging and exciting to finding subject experts knowledgeable enough to contribute to product evolution and Cognition is learning from client experts where the best opportunities for growth lie.

Recent interviews with Cognition executives, and those of other sponsors, gave me the opportunity to get their reactions to my conclusions about this industry. These were the more interesting thoughts that came from Cognition after they had reviewed the Gilbane report:

  • Feedback from current clients and attendees at 2010 conferences, where Dr. Dahlgren was a featured speaker, confirms escalating awareness of the field; she feels that “This is the year of Semantics.” It is catching the imagination of IT folks who understand the diverse and important business problems to which semantic technology can be applied.
  • In addition to a significant upswing in semantics applied in life sciences, publishing, law and energy, Cognition sees specific opportunities for growth in risk assessment and risk management. Using semantics to detect signals, content salience, and measures of relevance are critical where the quantity of data and textual content is too voluminous for human filtering. There is not much evidence that financial services, banking and insurance are embracing semantic technologies yet, but it could dramatically improve their business intelligence and Cognition is well positioned to give support to leverage their already tested tools.
  • Enterprise semantic search will begin to overcome the poor reputation that traditional “string search” has suffered. There is growing recognition among IT professionals that in the enterprise 80% of the queries are unique; these cannot be interpreted based on popularity or social commentary. Determining relevance or accuracy of retrieved results depends on the types of software algorithms that apply computational linguistics, not pattern matching or statistical models.

In Dr. Dahlgren’s view, there is no question that a team approach to deploying semantic enterprise search is required. This means that IT professionals will work side-by-side with subject matter experts, search experts and vocabulary specialists to gain the best advantage from semantic search engines.

The unique language aspects of an enterprise content domain are as important as the software a company employs. The Cognition baseline semantic net, out-of-the-box, will always give reliable and better results than traditional string search engines. However, it gives top performance when enhanced with enterprise language, embedding all the ways that subject experts talk about their topical domain, jargon, acronyms, code phrases, etc.

With elements of its software already embedded in some notable commercial applications like Bing, Cognition is positioned for delivering excellent semantic search for an enterprise. They are taking on opportunities in areas like risk management that have been slow to adopt semantic tools. They will deliver software to these customers together with services and expertise to coach their clients through the implementation, deployment and maintenance essential to successful use. The enthusiasm expressed to me by Kathleen Dahlgren about semantics confirms what I also heard from Cognition clients. They are confident that the technology coupled with thoughtful guidance from their support services will be the true value-added for any enterprise semantic search application using Cognition.

The free download of the Gilbane study and deep-dive on Cognition was announced on their Web site at this page.

Federated Media Acquires Technology Suite from TextDigger

Federated Media Publishing, a “next-generation” media company, announced the acquisition of a platform for semantic and linguistic profiling of web-based content from TextDigger, a San Jose-based semantic search startup. FM provides a full suite of media and marketing services for brand advertisers that depends heavily on a proprietary media and marketing technology platform. TextDigger’s technology complements FM’s platform with a set of semantic solutions for content tagging, filtering and clustering, as well as related tools that enhance the user experience, ad targeting, and semantic search engine optimization for a site. TextDigger will continue its search business, all of TextDigger’s customers will continue to be supported by either FM or TextDigger, depending on the type of project or service. www.federatedmedia.net www.textdigger.com

« Older posts