Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 25 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Where and How Can You Look for Good Enterprise Search Interface Design?

Designing an enterprise search interface that employees will use on their intranet is challenging in any circumstance. But starting from nothing more than verbal comments or even a written specification is really hard. However, conversations about what is needed and wanted are informative because they can be aggregated to form the basis for the overarching design.

Frequently, enterprise stakeholders will reference a commercial web site they like or even search tools within social sites. These are a great starting point for a designer to explore. It makes a lot of sense to visit scores of sites that are publicly accessible or sites where you have an account and navigate around to see how they handle various design elements.

To start, look at:

  • How easy is it to find a search box?
  • Is there an option to do advanced searches (Boolean or parametric searching)?
  • Is there a navigation option to traverse a taxonomy of terms?
  • Is there a “help” option with relevant examples for doing different kinds of searches?
  • What happens when you search for a word that has several spellings or synonyms, a phrase (with or without quotes), a phrase with the word and in it, a numeral, or a date?
  • How are results displayed: what information is included, what is the order of the results and can you change them? Can you manipulate results or search within the set?
  • Is the interface uncluttered and easily understood?

The point of this list of questions is that you can use it to build a set of criteria for designing what your enterprise will use and adopt, enthusiastically. But this is only a beginning. By actually visiting many sites outside your enterprise, you will find features that you never thought to include or aggravations that you will surely want to avoid. From these experiences on external sites, you can build up a good list of what is important to include or banish from your design.

When you find sites that you think are exemplary, ask key stakeholders to visit them and give you their feedback, preferences and dislikes. Particularly, you want to note what confuses them or enthusiastic comments about what excites them.

This post originated because several press notices in the past month brought to my attention Web applications that have sophisticated and very specialized search applications. I think they can provide terrific ideas for the enterprise search design team and also be used to demonstrate to your internal users just what is possible.

Check out these applications and articles: on KNovel, particularly this KNovel pageThomasNet; EBSCOHost mentioned in this article about the “deep Web.”. All these applications reveal superior search capabilities, have long track records, and are already used by enterprises every day. Because they are already successful in the enterprise, some by subscription, they are worth a second look as examples of how to approach your enterprise’s search interface design.

Meta Tags and Trusted Resources in the Enterprise

A recent article about how Google Internet search does not use meta tags to find relevant content got me thinking about a couple of things.

First it explains why none of the articles I write for this blog about enterprise search appear in Google alerts for “enterprise search.” Besides being a personal annoyance, easily resolved if I invested in some Internet search optimization, it may explain why meta tagging is a hard sell behind the firewall.

I do know something about getting relevant content to show up in enterprise search systems and it does depend on a layer of what I call “value-added metadata” by someone who knows the subject matter in target content and the audience. Working with the language of the enterprise audience that relies on finding critical content to do their jobs, a meta tagger will bring out topical language known to be the lingua franca of the dominant searchers as well as the language that will be used by novice employee searchers. The key here is to recognize that in any specific piece of content its “aboutness” may never be explicitly spelled out in terminology by the author.

In one example, let’s consider some fundamental HR information about “holiday pay” or “compensation for holidays” or “compensation for time-off.” The strings in quotes were used throughout documents on the intranet of one organization where I consulted. When some complained about not being able to find this information using the company search system, my review of search logs showed a very large number of searches for “vacation pay” and almost no searches for “compensation” or “holidays” or “time off.” Thus, there was no way that using the search engine employees would stumble upon the useful information they are seeking – unless, meta tags make “vacation pay” a retrievable index pointer to these documents. The tagger would have analyzed the search logs, seen the high number of searches for that phrase and realized that it was needed as a meta tag.

Now, back to Google’s position on ignoring meta tags because writers and marketing managers were “gaming the system.” They were adding tags they thought would be popular to get people to look at content not related but for which they were seeking a huge audience.

I have heard the concern that people within enterprises might also hijack the usefulness of content they were posting in blogs or wikis to get more “eyeballs” in the organization. This is a foolish concern, in my opinion. First I have never seen evidence that this happens and don’t believe that any productive enterprise has people engaging in this obvious foolishness.

More importantly, professional growth and success depends on the perceptions of others, their belief in you and your work, and the value of your ideas. If an employee is so foolish as to misdirect fellow employees to useless or irrelevant content, he is not likely to gain or keep the respect of his peers and superiors. In the long run persistent, misleading or mischievous meta tagging will have just the opposite effect, creating a pathway to the door.

Conversely, the super meta tagger with astute insights into what people are looking for and how they are most likely to look for it, will be the valued expert we all need to care for and spoon feed us our daily content. Trusted resources rise to the top when they are appropriately tagged and become bedrock content when revealed through enterprise search on well-managed intranets.

Ecordia Releases Content Analysis Tool for Search Engine Optimization

Ecordia has announced the availability of its new predictive content analysis application, the Ecordia Content Optimizer. Designed for copywriters, journalists, and SEO practitioners, this content analysis application provides automated intelligence and recommendations for improving the structure of content prior to publishing. Available for free, this turn-key web application provides a number of features to aid writers in the creation and validation of content including: advanced keyword research during authoring; detailed scoring of your content based on 15 proven SEO techniques; automated recommendations on how you can improve your content for search engines; intelligent keyword extraction that compares your content to popular search terms; sophisticated Keyword Analysis that scores your keyword usage based on 5 statistical formulas. The Ecordia Content Optimizer has been in beta development for over a year and is currently in use by a number of SEO practitioners. The Ecordia Content Optimizer provides content analysis capabilities ideally suited for web publishers who wish to: improve their quality score for landing pages used in PPC campaigns; SEO professionals that want to validate and review content prior to publishing; blog sites that wish to improve the quality of their ads from contextual ad networks; and PR Practitioners that want to optimize their press release prior to publishing. The Ecordia Content Optimizer is licensed on a per user monthly subscription. http://www.ecordia.com/

Atex Releases Polopoly v9.13 Web Content Management System

Atex released an update to their Web content management system, Polopoly 9.13, which integrates with their Text Mining engine to automatically tag, and categorize content. A new Polopoly widget also allows content to be “batch categorized”, which enhances the search results for end users, while providing internal users with a discovery and knowledge management tool. Instead of editors applying relevant categories manually, the text mining engine will now do it automatically. Editors can instruct the engine to analyze a piece of content and suggest relevant categories based on the text, and receive suggestions based on the metadata and IPTC categorization. With Polopoly 9.13, classified content is automatically placed in dynamic lists based on metadata selections in the repository. These lists can automatically serve up older stories with links for related content, which are placed in context alongside the current articles. Interested users could be encouraged to  “read more” or “find similar” stories based on information from the articles they are viewing. Publishers can even create new pages based entirely on archived content that’s been categorized by metadata. http://atex.com

Kentico CMS for ASP.NET Gets New Enterprise Search Capabilities

Kentico Software released a new version 4.1 of Kentico CMS for ASP.NET. The new version comes with a enterprise-class search engine as well as user productivity enhancements. The search engine enables web content to be searchable to assist  visitors in finding information. The search engine provides search results with ranking, previews, thumbnail images and customizable filters. The site owners can dictate which parts of the site, which content types and which content fields are searchable. The search engine uses the Lucene search framework. The new version also enhances productivity by changing the way images are inserted into text. The uploaded images can be part of the page life cycle. When the page is removed from the site, the related images and attachments are also removed which helps organizations avoid invalid or expired content on their server. Other improvements were made to the management of multi-lingual web sites. Kentico CMS for ASP.NET now supports workflow configuration based on the content language and it allows administrators to grant editors with permissions to chosen language versions. Content editors can see which documents are not translated or their translation is not up-to-date. http://www.kentico.com/

Convergence of Enterprise Search and Text Analytics is Not New

Prompted by the news item about IBM’s bid for SPSS and similar acquisitions by Oracle, SAP and Microsoft made me think about the predictions of more business intelligence (BI) capabilities being conjoined with enterprise search. But why now and what is new about pairing search and BI? They have always been complementary, not only for numeric applications but also for text analysis. Another article by John Harney in KMWorld referred to the “relatively new technology of text analytics” for analyzing unstructured text. The article is a good summary of some newer tools but the technology itself has had a long shelf life, too long for reasons which I’ll explore later.

Like other topics in this blog this one requires a readjustment in thinking by technology users. One of the great things about digitizing text was the promise of ways in which it could be parsed, sorted and analyzed. With heavy adoption of databases that specialized in textual, as well as numeric and date data fields for business applications in the 1960s and 70s, it became much easier for non-technical workers to look at all kinds of data in new ways. Early database applications leveraged their data stores using command languages; the better ones featured statistical analysis and publication quality report builders. Three that I was familiar with were DRS from ADM, Inc., BASIS from Battelle Columbus Labs and INQUIRE from IBM.

Tools that accompanied database back-ends had the ability to extract, slice and dice the database content, including very large text fields to report: word counts, phrase counts (breaking on any delimiter), transaction counts, relationships among data elements across associated record types, ability to create relationships on the fly, report expert activity and working documents, and describe distribution of resources. These are just a few examples of how new content assets could be created for export in minutes. In particular, a sort command with DRS had histogram controls that were invaluable to my clients managing corporate document and records collections, news clippings files, photographs, patents, etc. They could evaluate their collections by topic, date ranges, distribution, source, and so on, at any time.

So, there existed years ago the ability to connect data structures and use a command language to formulate new data models that informed and elucidated how information was being used in the organization, or to illustrate where there were holes in topics related to business initiatives. What were the barriers to wide-spread adoption? Upon reflection, I came to realize that extracting meaningful content from database in new and innovative formats requires a level of abstract thinking for which most employees are not well-trained. Putting descriptive data into a database via a screen form, then performing a transaction on the object of that data on another form, and then adding more data about another similar but different object are isolated in the database user’s experience and memory. The typical user is not trained to think about how the pieces of data might be connected in the database and therefore is not likely to form new ideas of how it can all be extracted in a report with new information about the content. There is a level of abstraction that eludes most workers whose jobs consist of a lot of compartmentalized tasks.

It was exciting to encounter prospects that really grasped the power of these tools and were excited to push the limits of the command language and reporting applications, but they were scarce. It turned out that our greatest use came in applying text analytics to the extraction of valuable information from our customer support database. A rigorously disciplined staff populated it after every support call with not only demographic information about the nature of the call, linked to a customer record that had been created back at the first contact during the sales process (with appropriate updates along the way in the procurement process) but also a textual description of the entire transaction. Over time this database was linked to a “wish list” database and another “fixes” database and the entire networked structure provided extremely valuable reports that guided both development work and documentation production. We also issued weekly summary reports to the entire staff so everyone was kept informed about product conditions and customer relationships. The reporting tools provided transparency to all staff about company activity and enabled an early version of “social search collaboration.”

Current text analytics products have significantly more algorithmic horsepower than the old command languages. But making the most of their potential and transforming them into utilities that any knowledge worker can leverage will remain a challenge for vendors in the face of poor abstract reasoning among much of the work force. The tools have improved but maybe not in all the ways they need to for widespread adoption. Workers should not have to be dependent on IT folks to create that unique analysis report that reveals a pattern or uncovers product flaws described by multiple customers. We expect workers to multitask, have many aptitudes and skills, and be self-servicing in so many aspects of their work, but for them to flourish the tools fall short too often. I’m putting in a big plug for text analytics for the masses, soon, so that enterprise search begins to deliver more than personalized lists of results for one person at a time. Give more reporting power to the user.

Semantic Search has Its Best Chance for Successes in the Enterprise

I am expecting significant growth in the semantic search market over the next five years with most of it focused on enterprise search. The reasons are pretty straightforward:

  • Semantic search is very hard and to scale it to the Web compounds the complexity.
  • Because the semantic Web is so elusive and results have been spotty with not much traction, it will be some time before it can be easily monetized.
  • Like many things that are highly complex, a good model will be to break the challenge of semantic search into smaller targeted business problems where focus is on a particular audience seeking content from a narrower domain.

I base this predication on my observation of the on-going struggle for organizations to get a strong framework in place to manage content effectively. By effectively I mean, establishing solid metadata, governance and publishing protocols that ensure that the best information knowledge workers produce is placed in range for indexing and retrieval. Sustained discipline and the people to exercise it just aren’t being employed in many enterprises to make this happen in a cohesive and comprehensive fashion. I have been discouraged by the number of well-intentioned projects I have seen flounder because organizations just can’t commit long-term or permanent human resources to the activity of content governance. Sometimes it is just on-again-off-again. What enterprises need are people with deep knowledge about the organization and how its content fits together in a logical framework for all types of knowledge workers. Instead, organizations tend to assign this job to external consultants or low-level staffers who are not well-grounded in the work of the particular enterprise. The results are predictably disappointing.

Enter semantic search technologies where there are multiple algorithmic tools available to index and retrieve content for complex and multi-faceted queries. Specialized semantic technologies are often well suited to shorter term projects for which domain specific vocabularies can be built more quickly with good results. Maintaining targeted vocabulary ontologies for a focused topic can be done with fewer human resources and a carefully bounded ontology can become an intelligent feed to a semantic search engine, helping it index with better precision and relevance.

This scenario is proposed with one caveat; enterprises must commit to having very smart people with enterprise expertise to build the ontology. Having a consultant coach the subject matter expert in method, process and maintenance guidelines for doing so is not a bad idea but the consultant has to prepare the enterprise for sustainability after exiting the scene.

The wager here is that enterprises can ramp up semantic search with a series of short, targeted projects, each of which establishes a goal of solving one business problem at a time and committing to efficient and accurate content retrieval as part of the solution. By learning what works well in each situation, intranet web retrieval will improve systematically and thoughtfully. The ramp to a better semantic Web will be paved with these interlocking pieces.

Keep an eye on these companies to provide technologies for point solutions in business critical applications: Basis Technology, Cognition Technology, Connotate, Expert Systems, Lexalytics, Linguamatics, Metatomix, Semantra, Sinequa and Temis.

Ontopia 5.0.0 Released

The first open source version of Ontopia has been released, which you can download from Google Code. This is the same product as the old commercial Ontopia Knowledge Suite, but with an open source license, and with the old license key restrictions removed. The new version has been created by not just by the Bouvet employees who have always worked on the product, but also by open source volunteers. In addition to bug fixes and minor changes, the main new features in this version are: Support for TMAPI 2.0; The new tolog optimizer; The new TologSpy tolog query profiler; The net.ontopia.topicmaps.utils. QNameRegistry and QNameLookup classes have been added, providing  lookup of topics using qnames; Ontopia now uses the Simple Logging Facade for Java (SLF4J), which makes it easier to switch logging engines, if desired. http://www.ontopia.net/

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑