Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 21 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Gilbane Advisor 3-19-19 — Federated ML, ephemeral messaging, search for humans

Google releases federated machine learningTensorFlow summit 2019

Federated learning is going to be a thing. Health care is just one example… “TensorFlow Federated will provide distributed machine learning for developers to train models across many mobile devices without data ever leaving those devices. Encryption provides an additional layer of privacy, and weights from models trained on mobile devices are shared with a central model for continuous learning.” Read More

A warning on the dangers of ephemeral messaging

The Information’s Sam Lessin is bullish about Facebook’s moving to full encryption, but thinks a reliance on ephemeral messaging is a big mistake. He makes a good case and the issues he raises need broader consideration. (Firewall – but you can get access by providing an email.) Read More

Search engines: a human perspective

Wise words on search applications from Daniel Tunkelang.

The foundation of human-computer information retrieval (HCIR) is that search engines help searchers who help themselves. The best search engines reward searchers’ incremental effort with a higher return on investment. … But searchers have been trained by simple search interfaces, and their laziness is compounded by a skepticism of anything that violates their expectations. In order to earn searcher effort, search engines have to provide simple, incremental, and effective steps that guide searchers — and that teach them through experience that the return justifies the additional effort. Read More

Facebook’s News Feed era is now officially over

It’s anyone’s guess where Facebook will end up after the strategic shift announced last week. The new direction impacts all parts of the company and raises questions about their business model, growth, and of course, organization. Read More


Join us at Gilbane’s Digital Experience Conference

Digital experience strategies, technologies, and practices, for marketing and the workplace

Also…

The Gilbane Advisor curates content for content, computing, and digital experience professionals. We focus on strategic technologies. We publish more or less twice a month except for August and December.

Speaker Spotlight: John Felahi – Making content findable

In another installment of Speaker Spotlight, we posed a couple of our frequently asked questions to speaker John Felahi, Chief Strategy Officer at Content Analyst Company, LLC. We’ve included his answers here. Be sure to see additional Speaker Spotlights from our upcoming conference.

John_Felahi-horiz

Speaker Spotlight: John Felahi

Chief Strategy Officer

Content Analyst Company, LLC

 

What is the best overall strategy for delivering content to web, multiple mobile, and upcoming digital channels? What is the biggest challenge? Development and maintenance cost? Content control? Brand management? Technology expertise?

One of the biggest challenges to delivering content to the web is making it as findable as possible to potential interested viewers.  While traditional, manual tagging and keyword search methods may have gotten us this far, and may be good enough for some use cases, they’re still not without limitations. The good news is, there are far more advanced, sophisticated – and automated – technologies available to remedy the numerous limitations of manual tagging content and keyword-based search. The limitations of manual tagging and keyword-based include:

  • Term creep – New terms constantly emerge, requiring taxonomies to be constantly updated.
  • Polysemy – Take Apple, for example. Is your user searching for the company, the Beatles’ record label, or the fruit?
  • Acronyms – Texting has introduced an entirely new language of acronyms (LOL, TTYL, WDYT).  Manually tagging content requires the editor to consider possible acronyms the users will be searching for.
  • Abbreviations – Tagging content with long, scientific terms, geographies, etc. require editors to factor these in along with the long terms they represent.
  • Misspellings – Thanks to spellcheck and autocorrect, technology has become much more forgiving for those who never made it past the first round eliminations in their sixth grade spelling bee. Content search, unfortunately, needs to be equally accommodating, if you want your users to find your content – which means tagging it with common misspellings.
  • Language – The web has certainly made the world a much smaller place, but that doesn’t mean everyone speaks English.  Making content findable in any language means it has to also be tagged in multiple languages.

On to the good news – there’s technology that’s been used for years in eDiscovery and the US Intelligence Community to overcome these very challenges, but for different reasons. Because the bad guys aren’t tagging their content to make it more findable, the intel community needs a better way to find what they’re looking for. And in eDiscovery, finding relevant content can make a multi-million dollar difference to the outcome of a particular litigation or other regulatory matter. That’s why tens of thousands of legal reviewers and countless analysts in the intel community use a technology known as concept-aware advanced analytics.

How concept-aware advanced analytics differs from manual tagging and keyword search

As its name implies, concept-aware understands the underlying concepts within the content. As such, it can tag content automatically.  On the viewer’s side, content can be found by simply saying, “find more like this.” Categories are defined by taking examples that represent the concepts of a category. The system “learns” what that category is all about, and can then identify conceptually similar content and apply the same category. The process is the same on the search side. The user points to a piece of content and says, “find more like this.” Or as the content publisher, you present the viewer with conceptually similar content, i.e., “you may also be interested in these articles.”

While concept-aware advanced analytics doesn’t necessarily replace manual tagging and keyword search – which work very well in certain situations – the technology clearly overcomes many of the limitations of traditional tagging and search methods.

Catch Up with John at Gilbane

Track E: Content, Collaboration, and the Employee Experience

E7: Strategic Imperatives for Enterprise Search to Succeed
Wednesday, December, 4: 2:00 p.m. – 3:20 p.m.

[button link=”http://gilbaneconference.com/program” variation=”red”]Complete Program[/button] [button link=”http://gilbaneconference.com/schedule” variation=”red”]Conference Schedule[/button] [button link=”http://gilbaneconference.com/registration” variation=”red”]Register Today[/button]

E-discovering Language to Launch Your Taxonomy

New enterprise initiatives, whether for implementing search solutions or beginning a new product development program, demand communication among team leaders and participants. Language matters; defining terminology for common parlance is essential to smooth progress toward initiative objectives.

Glossaries, dictionaries, taxonomies, thesauri and ontologies are all mechanisms we use routinely in education and work to clarify terms we use to engage and communicate understanding of any specialized domain. Electronic social communication added to the traditional mix of shared information (e.g. documents, databases, spreadsheets, drawings, standardized forms) makes business transactional language more complex. Couple this with the use of personal devices for capturing and storing our work content, notes, writings, correspondence, design and diagram materials and we all become content categorizing managers. Some of us are better than others at organizing and curating our piles of information resources.

As recent brain studies reveal, humans, and probably any animal with a brain, have established cognitive areas in our brains with pathways and relationships among categories of grouped concepts. This reinforces our propensity for expending thought and effort to order all aspects of our lives. That we all organize differently across a huge spectrum of concepts and objects makes it wondrous that we can live and work collaboratively at all. Why after 30+ years of marriage do I arrange my kitchen gadget drawer according to use or purpose of devices while my husband attempts to store the same items according to size and shape? Why do icons and graphics placed in strange locations in software applications and web pages rarely impart meaning and use to me, while others “get it” and adapt immediately?

The previous paragraph may seem to be a pointless digression from the subject of the post but there are two points to be made here. First, we all organize both objects and information to facilitate how we navigate life, including work. Without organization that is somehow rationalized, and established accordingly to our own rules for functioning, our lives descend into dysfunctional chaos. People who don’t organize well or struggle with organizing consistently struggle in school, work and life skills. Second, diversity of practice in organizing is a challenge for working and living with others when we need to share the same spaces and work objectives. This brings me to the very challenging task of organizing information for a website, a discrete business project, or an entire enterprise, especially when a diverse group of participants are engaged as a team.

So, let me make a few bold suggestions about where to begin with your team:

  • Establish categories of inquiry based on the existing culture of your organization and vertical industry. Avoid being inventive, clever or idiosyncratic. Find categories labels that everyone understands similarly.
  • Agree on common behaviors and practices for finding by sharing openly the ways in which members of the team need to find, the kinds of information and answers that need discovering, and the conditions under which information is required. These are the basis for findability use cases. Again, begin with the usual situations and save the unusual for later insertion.
  • Start with what you have in the form of finding aids: places, language and content that are already being actively used; examine how they are organized. Solicit and gather experiences about what is good, helpful and “must have” and note interface elements and navigation aids that are not used. Harvest any existing glossaries, dictionaries, taxonomies, organization charts or other definition entities that can provide feeds to terminology lists.
  • Use every discoverable repository as a resource (including email stores, social sites, and presentations) for establishing terminology and eventually writing rules for applying terms. Research repositories that are heavily used by groups of specialists and treat them as crops of terminology to be harvested for language that is meaningful to experts. Seek or develop linguistic parsing and term extraction tools and processes to discover words and phrases that are in common use. Use histograms to determine frequency of use, then alphabetize to find similar terms that are conceptually related, and semantic net tools to group discovered terms according to conceptual relationships. Segregate initialisms, acronyms, and abbreviations for analysis and insertion into final lists, as valid terms or synonyms to valid terms.
  • Talk to the gurus and experts that are the “go-to people” for learning about a topic and use their experience to help determine the most important broad categories for information that needs to be found. Those will become your “top term” groups and facets. Think of top terms as topical in nature (e.g. radar, transportation, weapons systems) and facets as other categories by which people might want to search (e.g. company names, content types, conference titles).
  • Simplify your top terms and facets into the broadest categories for launching your initiative. You can always add more but you won’t really know where to be the most granular until you begin using tags applied to content. Then you will see what topics have the most content and require narrower topical terms to avoid having too much content piling up under a very broad category.
  • Select and authorize one individual to be the ultimate decider. Ambiguity of categorizing principles, purpose and needs is always a given due to variations in cognitive functioning. However, the earlier steps outlined here will have been based on broad agreement. When it comes to the more nuanced areas of terminology and understanding, a subject savvy and organizationally mature person with good communication skills and solid professional respect within the enterprise will be a good authority for making final decisions about language. A trusted professional will also know when a change is needed and will seek guidance when necessary.

Revisit the successes and failures of the applied term store routinely: survey users, review search logs, observe information retrieval bottlenecks and troll for new electronic discourse and content as a source of new terminology. A recent post by taxonomy expert Heather Hedden gives more technical guidance about evaluating and sustaining your taxonomy maintenance.

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Gilbane Conference workshops

In case you missed it last week while on vacation the Gilbane Conference workshop schedule and descriptions were posted. The half-day workshops tale place at the Intercontinental Boston Waterfront Hotel on Tuesday, November 27, 9:00 am to 4:00 pm:

  • Insider’s Guide to Selecting WCM Technology – Tony Byrne & Irina Guseva, Real Story Group
  • Implementing Systems of Engagement: Making it Work with the Team That Will Make it Work – Scott Liewehr & Rob Rose, Digital Clarity Group
  • So You Want to Build a Mobile Content App? – Jonny Kaldor, Kaldor Group (creators of Pugpig)
  • Content Migrations: A Field Guide – Deane Barker, Blend Interactive & David Hobbs, David Hobbs Consulting
  • Social Media: Creating a Voice & Personality for Your Brand – AJ Gerritson, 451 Marketing
  • Text Analytics for Semantic Applications – Tom Reamy, KAPS Group

Save the date and check http://gilbaneboston.com for further information about the main conference schedule & conference program as they become available.

Gilbane Boston workshop details posted

The best way to start the Gilbane conference is by attending one or two of the pre-conference workshops offered on Tuesday, November 27, 9:00 am to 4:00 pm:

  • Insider’s Guide to Selecting WCM Technology – Tony Byrne & Irina Guseva, Real Story Group
  • Implementing Systems of Engagement: Making it Work with the Team That Will Make it Work – Scott Liewehr & Rob Rose, Digital Clarity Group
  • So You Want to Build a Mobile Content App? – Jonny Kaldor, Kaldor Group (creators of Pugpig)
  • Content Migrations: A Field Guide – Deane Barker, Blend Interactive & David Hobbs, David Hobbs Consulting
  • Social Media: Creating a Voice & Personality for Your Brand – AJ Gerritson, 451 Marketing
  • Text Analytics for Semantic Applications – Tom Reamy, KAPS Group

See the schedule and full descriptions of the in-depth pre-conference workshops.

Please save the date and check http://gilbaneboston.com for further information about the main conference schedule & conference program as they become available.

New Location!
Intercontinental Boston Waterfront Hotel
510 Atlantic Avenue
Boston, Massachusetts 02210

W3C Launches Linked Data Platform Working Group

W3C launched the new Linked Data Platform (LDP) Working Group to promote the use of linked data on the Web. Per its charter, the group will explain how to use a core set of services and technologies to build powerful applications capable of integrating public data, secured enterprise data, and personal data. The platform will be based on proven Web technologies including HTTP for transport, and RDF and other Semantic Web standards for data integration and reuse. The group will produce supporting materials, such as a description of uses cases, a list of requirements, and a test suite and/or validation tools to help ensure interoperability and correct implementation.

A rarity these days – an announcement that used ‘data’ instead of ‘big data’! And the co-chairs are even from IBM and EMC.

Why is it so Hard to “Get” Semantics Inside the Enterprise?

Semantic Software Technologies: Landscape of High Value Applications for the Enterprise was published just over a year ago. Since then the marketplace has been increasingly active; new products emerge and discussion about what semantics might mean for the enterprise is constant. One thing that continues to strike me is the difficulty of explaining the meaning of, applications for, and context of semantic technologies.

Browsing through the topics in this excellent blog site, http://semanticweb.com , it struck me as the proverbial case of the blind men describing an elephant. A blog, any blog, is linear. While there are tools to give a blog dimension by clustering topics or presenting related information, it is difficult to understand the full relationships of any one blog post to another. Without a photographic memory, an individual does not easily connect ideas across a multi-year domain of blog entries. Semantic technologies can facilitate that process.

Those who embrace some concept of semantics are believers that search will benefit from “semantic technologies.” What is less clear is how evangelists, developers, searchers and the average technology user can coalesce around the applications that will semantically enable enterprise search.

On the Internet content that successfully drives interest, sales, opinion and individual promotion does so through a combination of expert crafting of metadata, search engine technology that “understands” the language of the inquirer and the content that can satisfy the inquiry. Good answers are reached when questions are understood first and then the right content is selected to meet expectations.

In the enterprise, the same care must be given to metadata, search engine “meaning” analysis tools and query interpretation for successful outcomes. Magic does not happen without people behind the scenes to meet these three criteria executing linguistic curation, content enhancement and computational linguistic programming.

Three recent meeting events illustrate various states of semantic development and adoption, even as the next conference, Semantic Tech & Business Conference – Washington, D.C. on November 29 – is upon us:

Event 1 – A relatively new group, the IKS-Community funded by the EU has been supporting open source software developers since 2009. In July they held a workshop in Paris just past the mid-point of their life cycle. Attendees were primarily entrepreneurs and independent open source developers seeking pathways for their semantically “tuned” content management solutions. I was asked to suggest where opportunities and needs exist in US markets. They were an enthusiastic audience and are poised to meet the tough market realities of packaging highly sophisticated software for audiences that will rarely understand how complex the stuff “under the hood” really is. My principal charge to them was to create tools that “make it really easy” to work with vocabulary management and content metadata capture, updates, and enhancements.

Event 2. – On this side of the pond, UK firm Linguamatics hosted its user group meeting in Boston in October. Having interviewed a number of their customers last year to better understand their I2E product line, I was happy to meet people I had spoken with and see the enthusiasm of a user community vested in such complex technology. Most impressive is the respectful tone and thoughtful sharing between Linguamatics principals and their customers. They share the knowledge of how hard it is to continually improve search technology that delivers answers to semantically complex questions using highly specialized language. Content contributors and inquirers are all highly educated specialists seeking answers to questions that have never been asked before. Think about it, search engines designed to deliver results for frequently asked questions or to find content on popular topics is hard enough, but finding the answer to a brand new question is a quantum leap of difficulty in comparison.

To make matters even more complicated, answers to semantic (natural language) questions may be found in internal content, in published licensed content or some combination of both. In the latter case, only the seeker may be able to put the two together to derive or infer an answer.

Publishers of content for licensing play a convoluted game of how they will license their content to enterprises for semantic indexing in combination with internal content. The Linguamatics user community is primarily in life sciences; this is one more hurdle for them to overcome to effectively leverage the vast published repositories of biological and medical literature. Rigorous pricing may be good business strategy, but research using semantic search could make more headway with more reasonable royalties that reflect the need for collaborative use across teams and partners.

Content wants to be found and knowledge requires outlets to enable innovation to flourish. In too many cases technology is impaired by lack of business resources by buyers or arcane pricing models of sellers that hold vital information captive for a well-funded few. Semantically excellent retrieval depends on an engine’s indexing access to all contextually relevant content.

Event 3. – Leslie Owens of Forrester Research, at the Fall 2011 Enterprise Search Summit conducted a very interesting interactive session that further affirms the elephant and blind men metaphor. Leslie is a champion of metadata best practices and writes about the competencies and expertise needed to make valuable content accessible. She engaged the audience with a series of questions about its wants, needs, beliefs and plans for semantic technologies. As described in an earlier paragraph about how well semantics serves us on the Web, most of the audience puts its faith in that model but is doubtful of how or when similar benefits will accrue to enterprise search. Leslie and a couple of others made the point that a lot more work has to be done on the back-end on content in the enterprise to get these high-value outcomes.

We’ll keep making the point until more adopters of semantic technologies get serious and pay attention to content, content enhancement, expert vocabulary management and metadata. If it is automatic understanding of your content that you are seeking, the vocabulary you need is one that you build out and enhance for your enterprise’s relevance. Semantic tools need to know the special language you use to give the answers you need.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑