Category: Semantic technologies (Page 25 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.

Nstein Technologies Launches Semantic Site Search

November 18, 2009 / NewsShark

Nstein Technologies Inc. announced the release of a new product, Semantic Site Search (3S). 3S leverages Nstein’s text-mining technology to power a faceted site search which returns results that are organized categorically. 3S can ingest content from many different indices from many different web publishing platforms, meaning it indexes material across multiple properties. 3S’ embedded Text Mining Engine (TME) identifies concepts, categories, proper names, places, organizations, sentiment and topics in particular content pieces and then annotates those documents, creating a semantic fingerprint that exposes underlying nuances and meaning in content. 3S is also boasts a visual interface that is designed to allow administrators to tweak search sensitivity algorithms without having to modify hard code. 3S comes bundled with front-end wiidgets which could be used to point users to “similar content”, “most recent content”, or other identifying characteristics of content that one wants to promote. http://www.nstein.com

Clarabridge Releases Clarabridge Enterprise 4

November 16, 2009 / NewsShark

Clarabridge announced the general availability of Clarabridge Enterprise 4. Clarabridge Enterprise 4 includes the addition of an Ad-Hoc Uploader, upgrades to the Natural Language Processing (NLP) and Sentiment Engines, new collaboration tools in the Classification Suite and built-in Early Warnings and Alerts. Sentiment Engine Upgrades: clause-based sentiment and classification, along with a multitude of core engine enhancements; as well as added support for classifying data in foreign languages. Classification Templates to provide quick-start templates for analysts developing category models. Collaboration changes such as locking of models to prevent changes, rule history and roll back functionality, color-coding as a visual aid for maintaining models, and a preview feature. Early Warnings & Alerts: statistical warning and alert engines aimed at helping users proactively address customer experience issues by alerting them to anything that exceed defined thresholds. Ad-Hoc Uploader: The Ad-Hoc Uploader is designed to upload feedback sources for analysis directly from browsers. http://www.clarabridge.com/

Where and How Can You Look for Good Enterprise Search Interface Design?

October 29, 2009 / Lynda Moulton / 2 Comments

Designing an enterprise search interface that employees will use on their intranet is challenging in any circumstance. But starting from nothing more than verbal comments or even a written specification is really hard. However, conversations about what is needed and wanted are informative because they can be aggregated to form the basis for the overarching design.

Frequently, enterprise stakeholders will reference a commercial web site they like or even search tools within social sites. These are a great starting point for a designer to explore. It makes a lot of sense to visit scores of sites that are publicly accessible or sites where you have an account and navigate around to see how they handle various design elements.

To start, look at:

How easy is it to find a search box?
Is there an option to do advanced searches (Boolean or parametric searching)?
Is there a navigation option to traverse a taxonomy of terms?
Is there a “help” option with relevant examples for doing different kinds of searches?
What happens when you search for a word that has several spellings or synonyms, a phrase (with or without quotes), a phrase with the word and in it, a numeral, or a date?
How are results displayed: what information is included, what is the order of the results and can you change them? Can you manipulate results or search within the set?
Is the interface uncluttered and easily understood?

The point of this list of questions is that you can use it to build a set of criteria for designing what your enterprise will use and adopt, enthusiastically. But this is only a beginning. By actually visiting many sites outside your enterprise, you will find features that you never thought to include or aggravations that you will surely want to avoid. From these experiences on external sites, you can build up a good list of what is important to include or banish from your design.

When you find sites that you think are exemplary, ask key stakeholders to visit them and give you their feedback, preferences and dislikes. Particularly, you want to note what confuses them or enthusiastic comments about what excites them.

This post originated because several press notices in the past month brought to my attention Web applications that have sophisticated and very specialized search applications. I think they can provide terrific ideas for the enterprise search design team and also be used to demonstrate to your internal users just what is possible.

Check out these applications and articles: on KNovel, particularly this KNovel page; ThomasNet; EBSCOHost mentioned in this article about the “deep Web.”. All these applications reveal superior search capabilities, have long track records, and are already used by enterprises every day. Because they are already successful in the enterprise, some by subscription, they are worth a second look as examples of how to approach your enterprise’s search interface design.

Meta Tags and Trusted Resources in the Enterprise

October 21, 2009 / Lynda Moulton / 2 Comments

A recent article about how Google Internet search does not use meta tags to find relevant content got me thinking about a couple of things.

First it explains why none of the articles I write for this blog about enterprise search appear in Google alerts for “enterprise search.” Besides being a personal annoyance, easily resolved if I invested in some Internet search optimization, it may explain why meta tagging is a hard sell behind the firewall.

I do know something about getting relevant content to show up in enterprise search systems and it does depend on a layer of what I call “value-added metadata” by someone who knows the subject matter in target content and the audience. Working with the language of the enterprise audience that relies on finding critical content to do their jobs, a meta tagger will bring out topical language known to be the lingua franca of the dominant searchers as well as the language that will be used by novice employee searchers. The key here is to recognize that in any specific piece of content its “aboutness” may never be explicitly spelled out in terminology by the author.

In one example, let’s consider some fundamental HR information about “holiday pay” or “compensation for holidays” or “compensation for time-off.” The strings in quotes were used throughout documents on the intranet of one organization where I consulted. When some complained about not being able to find this information using the company search system, my review of search logs showed a very large number of searches for “vacation pay” and almost no searches for “compensation” or “holidays” or “time off.” Thus, there was no way that using the search engine employees would stumble upon the useful information they are seeking – unless, meta tags make “vacation pay” a retrievable index pointer to these documents. The tagger would have analyzed the search logs, seen the high number of searches for that phrase and realized that it was needed as a meta tag.

Now, back to Google’s position on ignoring meta tags because writers and marketing managers were “gaming the system.” They were adding tags they thought would be popular to get people to look at content not related but for which they were seeking a huge audience.

I have heard the concern that people within enterprises might also hijack the usefulness of content they were posting in blogs or wikis to get more “eyeballs” in the organization. This is a foolish concern, in my opinion. First I have never seen evidence that this happens and don’t believe that any productive enterprise has people engaging in this obvious foolishness.

More importantly, professional growth and success depends on the perceptions of others, their belief in you and your work, and the value of your ideas. If an employee is so foolish as to misdirect fellow employees to useless or irrelevant content, he is not likely to gain or keep the respect of his peers and superiors. In the long run persistent, misleading or mischievous meta tagging will have just the opposite effect, creating a pathway to the door.

Conversely, the super meta tagger with astute insights into what people are looking for and how they are most likely to look for it, will be the valued expert we all need to care for and spoon feed us our daily content. Trusted resources rise to the top when they are appropriately tagged and become bedrock content when revealed through enterprise search on well-managed intranets.

Ecordia Releases Content Analysis Tool for Search Engine Optimization

September 11, 2009 / NewsShark

Ecordia has announced the availability of its new predictive content analysis application, the Ecordia Content Optimizer. Designed for copywriters, journalists, and SEO practitioners, this content analysis application provides automated intelligence and recommendations for improving the structure of content prior to publishing. Available for free, this turn-key web application provides a number of features to aid writers in the creation and validation of content including: advanced keyword research during authoring; detailed scoring of your content based on 15 proven SEO techniques; automated recommendations on how you can improve your content for search engines; intelligent keyword extraction that compares your content to popular search terms; sophisticated Keyword Analysis that scores your keyword usage based on 5 statistical formulas. The Ecordia Content Optimizer has been in beta development for over a year and is currently in use by a number of SEO practitioners. The Ecordia Content Optimizer provides content analysis capabilities ideally suited for web publishers who wish to: improve their quality score for landing pages used in PPC campaigns; SEO professionals that want to validate and review content prior to publishing; blog sites that wish to improve the quality of their ads from contextual ad networks; and PR Practitioners that want to optimize their press release prior to publishing. The Ecordia Content Optimizer is licensed on a per user monthly subscription. http://www.ecordia.com/

Atex Releases Polopoly v9.13 Web Content Management System

August 18, 2009 / NewsShark

Atex released an update to their Web content management system, Polopoly 9.13, which integrates with their Text Mining engine to automatically tag, and categorize content. A new Polopoly widget also allows content to be “batch categorized”, which enhances the search results for end users, while providing internal users with a discovery and knowledge management tool. Instead of editors applying relevant categories manually, the text mining engine will now do it automatically. Editors can instruct the engine to analyze a piece of content and suggest relevant categories based on the text, and receive suggestions based on the metadata and IPTC categorization. With Polopoly 9.13, classified content is automatically placed in dynamic lists based on metadata selections in the repository. These lists can automatically serve up older stories with links for related content, which are placed in context alongside the current articles. Interested users could be encouraged to “read more” or “find similar” stories based on information from the articles they are viewing. Publishers can even create new pages based entirely on archived content that’s been categorized by metadata. http://atex.com

Kentico CMS for ASP.NET Gets New Enterprise Search Capabilities

August 17, 2009 / NewsShark

Kentico Software released a new version 4.1 of Kentico CMS for ASP.NET. The new version comes with a enterprise-class search engine as well as user productivity enhancements. The search engine enables web content to be searchable to assist visitors in finding information. The search engine provides search results with ranking, previews, thumbnail images and customizable filters. The site owners can dictate which parts of the site, which content types and which content fields are searchable. The search engine uses the Lucene search framework. The new version also enhances productivity by changing the way images are inserted into text. The uploaded images can be part of the page life cycle. When the page is removed from the site, the related images and attachments are also removed which helps organizations avoid invalid or expired content on their server. Other improvements were made to the management of multi-lingual web sites. Kentico CMS for ASP.NET now supports workflow configuration based on the content language and it allows administrators to grant editors with permissions to chosen language versions. Content editors can see which documents are not translated or their translation is not up-to-date. http://www.kentico.com/

Convergence of Enterprise Search and Text Analytics is Not New

August 6, 2009 / Lynda Moulton / 0 Comments

Prompted by the news item about IBM’s bid for SPSS and similar acquisitions by Oracle, SAP and Microsoft made me think about the predictions of more business intelligence (BI) capabilities being conjoined with enterprise search. But why now and what is new about pairing search and BI? They have always been complementary, not only for numeric applications but also for text analysis. Another article by John Harney in KMWorld referred to the “relatively new technology of text analytics” for analyzing unstructured text. The article is a good summary of some newer tools but the technology itself has had a long shelf life, too long for reasons which I’ll explore later.

Like other topics in this blog this one requires a readjustment in thinking by technology users. One of the great things about digitizing text was the promise of ways in which it could be parsed, sorted and analyzed. With heavy adoption of databases that specialized in textual, as well as numeric and date data fields for business applications in the 1960s and 70s, it became much easier for non-technical workers to look at all kinds of data in new ways. Early database applications leveraged their data stores using command languages; the better ones featured statistical analysis and publication quality report builders. Three that I was familiar with were DRS from ADM, Inc., BASIS from Battelle Columbus Labs and INQUIRE from IBM.

Tools that accompanied database back-ends had the ability to extract, slice and dice the database content, including very large text fields to report: word counts, phrase counts (breaking on any delimiter), transaction counts, relationships among data elements across associated record types, ability to create relationships on the fly, report expert activity and working documents, and describe distribution of resources. These are just a few examples of how new content assets could be created for export in minutes. In particular, a sort command with DRS had histogram controls that were invaluable to my clients managing corporate document and records collections, news clippings files, photographs, patents, etc. They could evaluate their collections by topic, date ranges, distribution, source, and so on, at any time.

So, there existed years ago the ability to connect data structures and use a command language to formulate new data models that informed and elucidated how information was being used in the organization, or to illustrate where there were holes in topics related to business initiatives. What were the barriers to wide-spread adoption? Upon reflection, I came to realize that extracting meaningful content from database in new and innovative formats requires a level of abstract thinking for which most employees are not well-trained. Putting descriptive data into a database via a screen form, then performing a transaction on the object of that data on another form, and then adding more data about another similar but different object are isolated in the database user’s experience and memory. The typical user is not trained to think about how the pieces of data might be connected in the database and therefore is not likely to form new ideas of how it can all be extracted in a report with new information about the content. There is a level of abstraction that eludes most workers whose jobs consist of a lot of compartmentalized tasks.

It was exciting to encounter prospects that really grasped the power of these tools and were excited to push the limits of the command language and reporting applications, but they were scarce. It turned out that our greatest use came in applying text analytics to the extraction of valuable information from our customer support database. A rigorously disciplined staff populated it after every support call with not only demographic information about the nature of the call, linked to a customer record that had been created back at the first contact during the sales process (with appropriate updates along the way in the procurement process) but also a textual description of the entire transaction. Over time this database was linked to a “wish list” database and another “fixes” database and the entire networked structure provided extremely valuable reports that guided both development work and documentation production. We also issued weekly summary reports to the entire staff so everyone was kept informed about product conditions and customer relationships. The reporting tools provided transparency to all staff about company activity and enabled an early version of “social search collaboration.”

Current text analytics products have significantly more algorithmic horsepower than the old command languages. But making the most of their potential and transforming them into utilities that any knowledge worker can leverage will remain a challenge for vendors in the face of poor abstract reasoning among much of the work force. The tools have improved but maybe not in all the ways they need to for widespread adoption. Workers should not have to be dependent on IT folks to create that unique analysis report that reveals a pattern or uncovers product flaws described by multiple customers. We expect workers to multitask, have many aptitudes and skills, and be self-servicing in so many aspects of their work, but for them to flourish the tools fall short too often. I’m putting in a big plug for text analytics for the masses, soon, so that enterprise search begins to deliver more than personalized lists of results for one person at a time. Give more reporting power to the user.

Category: Semantic technologies (Page 25 of 72)

Nstein Technologies Launches Semantic Site Search

Clarabridge Releases Clarabridge Enterprise 4

Where and How Can You Look for Good Enterprise Search Interface Design?

Meta Tags and Trusted Resources in the Enterprise

Ecordia Releases Content Analysis Tool for Search Engine Optimization

Atex Releases Polopoly v9.13 Web Content Management System

Kentico CMS for ASP.NET Gets New Enterprise Search Capabilities

Convergence of Enterprise Search and Text Analytics is Not New

Subscribe to the Gilbane Advisor

Choose Language

Topics we cover

Policies

Contact