Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 24 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


Weighing In On The Search Industry With The Enterprise In Mind

Two excellent postings by executives in the search industry give depth to the importance of Dassault Système’s acquisition of Exalead. If this were simply a ho-hum failure in a very crowded marketplace, Dave Kellogg of Mark Logic Corporation and Jean Ferré of Sinequa would not care.

Instead they are picking up important signals. Industry segments as important as search evolve and its appropriate applications in enterprises are still being discovered and proven. Search may change, as could the label, but whatever it is called it is still something that will be done in enterprises.

This analyst has praise for the industry players who continue to persevere, working to get the packaging, usability, usefulness and business purposes positioned effectively. Jean Ferré is absolutely correct; the nature of the deal underscores the importance of the industry and the vision of the acquirers.

As we segue from a number of conferences featuring search (Search Engines, Enterprise Search Summit, Gilbane) to broader enterprise technologies (Enterprise 2.0) and semantic technologies (SemTech), it is important for enterprises to examine the interplay among product offerings. Getting the mix of software tools just right is probably more important than any one industry-labeled class of software, or any one product. Everybody’s software has to play nice in the sandbox to get us to the next level of adoption and productivity.

Here is one analyst cheering the champions of search and looking for continued growth in the industry…but not so big it fails.

Search Engines – Architecture Meets Adoption

Trying to summarize a technology space as varied as that covered in two days at the Search Engines Meeting in Boston, April 26-27, is a challenge and opportunity. Avoiding the challenge of trying to represent the full spectrum, I’ll stick with the opportunity. Telling you that search is everywhere, in every technology we use and has a multitude of cousins and affiliated companion technologies is important.

The Gilbane Group focuses on content technologies. In its early history this included Web content management, document management, and CMS systems for publishers and enterprises. We now track related technologies expanding to areas including standards like DITA and XML, adoption of social tools, plus rapid growth in the drive to localize and globalize content; Gilbane has kept up with these trends.

My area, search and more specifically “enterprise search” or search “behind the firewall,” was added just over three years ago. It seemed logical to give attention to the principal reason for creating, managing and manipulating content, namely finding it. When I pay attention to search engines, I am also thinking about adjoining content technologies. My recent interest is helping readers learn about how technology on both the search side and content management/manipulation side need better context; that means relating the two.

If one theme ran consistently through all the talks at Enterprise Search Meeting, it was the need to define search in relationship to so many other content technologies. The speakers, for the most part, did a fine job of making these connections.

Here are just some snippets:

Bipin Patel CIO of ProQuest, shared the technology challenges of maintaining a 24/7 service while driving improvements to the search usability interface. The goal is to deliver command line search precision to users who do not have the expertise to (or patience) to construct elaborate queries. Balancing the tension between expert searchers (usually librarians) with everyone else who seeks content underscores the importance of human factors. My take-away: underlying algorithms and architecture are worth little if usability is neglected.

Martin Baumgartel spoke on the Theseus project for the semantic search marketplace, a European collaborative initiative. An interesting point for me is their use of SMILA (SeMantic Information Logistics Architecture) from Eclipse. By following some links on the Eclipse site I found this interesting presentation from the International Theseus Convention in 2009. The application of this framework model underscores the interdependency of many semantically related technologies to improve search.

Tamas Doszkocs of the National Library of Medicine told a well-annotated story of the decades of search and content enhancement technologies that are evolving to contribute to semantically richer search experiences. His metaphors in the evolutionary process were fun and spot-on at a very practical level: Libraries as knowledge bases > Librarians as search engines > the Web as the knowledge base > Search engines as librarians > moving toward understanding, content, context, and people to bring us semantic search. A similar presentation is posted on the Web.

David Evans noted that there is currently no rigorous evaluation methodology yet for mobile search but is it very different than what we do with desktop search. One slide that I found most interesting was the Human Language Technologies (HLT) that contribute to a richer mobile search experience, essentially numerous semantic tools. Again, this underscores that the challenges of integrating sophisticated hardware, networking and search engine architectures for mobile search are just a piece of the solution. Adoption will depend on tools that enhance content findability and usability.

Jeff Fried of Microsoft/Fast talked about “social search” and put forth this important theme: that people like to connect to content through other people. He made me recognize how social tools are teaching us that the richness of this experience is a self-reinforcing mechanism toward “the best way to search.” It has lessons for enterprises as they struggle to adopt social tools in mindful ways in tandem with improving search experiences.

Shekhar Pradhan of Docunexus shared this relevant thought about a failure of interface architecture and that is (to paraphrase): the ubiquitous search box fails because it does not demand context or mechanisms for resolving ambiguity. Obviously, this breaks down adoption for enterprise search when it is the only option offered.

Many more talks from this meeting will get rolled up in future reports and blogs.

I want to learn your experiences and observations about semantic search and semantic technologies, as well. Please note that we have posted a brief survey for a short time at: Semantic Technology Survey. If you have any involvement with semantic technologies, please take it.

Open Text to Acquire Nstein

Open Text Corporation (NASDAQ:OTEX) (TSX: OTC) and Nstein Technologies Inc. (TSX-V: EIN) announced that they have entered into a definitive agreement by which Open Text will acquire all of the issued and outstanding common shares of Nstein through an Nstein shareholder-approved amalgamation with a subsidiary of Open Text under the Companies Act (Québec). Based on the terms of the definitive agreement, Nstein shareholders will receive for each Nstein common share, CDN $0.65 in cash, unless certain eligible shareholders otherwise elect to receive a fraction of an Open Text TSX traded common share, having a value of CDN $0.65 based on the volume weighted average trading price of Open Text TSX traded common shares in the 10 trading day period immediately preceding the closing date of the acquisition. This purchase price represents a premium of approximately 100 percent above the 30 trading day average closing price of Nstein’s common shares. The transaction is valued at approximately CDN $35 million. Based in Montreal, Nstein’s solutions are sold across market segments such as media and information services, life sciences and government. The transaction is expected to close in the second calendar quarter and is subject to customary closing conditions, including approval of two-thirds of the votes cast by Nstein’s shareholders and applicable regulatory and stock exchange approvals. A special meeting of Nstein’s shareholders is expected to be held to consider the amalgamation in early April, 2010. http://www.opentext.com, http://www.nstein.com

In the end, good search may depend on good source.

As the world of search becomes more and more sophisticated (and that process has been underway for decades,) we may be approaching the limits of software’s ability to improve its ability to find what a searcher wants. If that is true, and I suspect that it is, we will finally be forced to follow the trail of crumbs up the content life cycle… to its source.

Indeed, most of the challenges inherent in today’s search strategy and products appears to grow from the fact that while we continually increase our demands for intelligence on the back end, we have done little if anything to address the chaos that exists on the front end. You name it, different word processing formats, spreadsheets, HTML tagged text, database delimited files, and so on are all dumped into what we think of as a coherent, easily searchable body of intellectual property. It isn’t and isn’t likely to become so any time soon unless we address the source.

Having spent some time in the library automation world, I can remember the sometimes bitter controversies over having just two major foundations for cataloging source material (Dewey and LC; add a third if you include the NICEM A/V scheme.) Had we known back then that the process of finding intellectual property would devolve into the chaos we now confront, with every search engine and database product essentialy rolling its own approach to rational search, we would have considered ourselves blessed. In the end, it seems, we must begin to see the source material, its physcial formats, its logical organization and its inclusion of rational cataloging and taxonomy elements as the conceptual raw material for its own location.

As long as the word processing world teaches that anyone creating anything can make it look like it should in a dozen different ways, ignoring any semblance of finding-aid inclusion, we probably won’t have a truly workable ability to find what we want without reworking the content or wading through a haystack of misses to find our desired hits.

Unfortunately, the solutions of yesteryear, including after-creation cataloging by a professional cataloger, probably won’t work now either, for cost if no other reason. We will be forced to approach the creators of valuable content, asking them for a minimum of preparation for searching their product, and providing the necessary software tools to make that possible.

We can’t act too soon because, despite the growth of software elegance and raw computer power, this situation will likely get worse as the sheer volume of valuable content grows. Regards, Barry Read more: Enterprise Search Practice Blog:  https://gilbane.com/search_blog/

TEMIS Unveils Luxid Content Pipeline

TEMIS announced the launch of Luxid Content Pipeline, a new content collection module integrated within the latest version of its content discovery solution, Luxid 5.1. This platform collects content from a range of information sources and feeds them into Luxid. After annotating content with relevant metadata, Luxid then applies search, discovery and sharing tools to the enriched content and provides users with content analytics and knowledge discovery. Luxid Content Pipeline accesses content by three different methods: Structured Access connects and automates the collection of documents from structured content sources such as Dialog, DataStar, ISI Web of Knowledge, Ovid, STN, Questel, EBSCOhost, Factiva, LexisNexis, MicroPatent, Scopus, ScienceDirect, Minesoft, Esp@cenet, and PubMed. Enterprise Content Management Access connects to corporate knowledge repositories such as EMC Documentum, EMC Documentum CenterStage and Microsoft Office SharePoint Server. To be as compatible as possible with a wide variety of document sources, Luxid Content Pipeline also supports the integration of UIMA (Unstructured Information Management Architecture) collection readers, enabling the connection to these sources using UIMA standard protocol and format conversion. http://www.temis.com/

Layering Technologies to Support the Enterprise with Semantic Search

Semantic search is a composite beast like many enterprise software applications. Most packages are made up of multiple technology components and often from multiple vendors. This raises some interesting thoughts as we prepare for Gilbane Boston 2009 to be held this week.

As part of a panel on semantic search, moderated by Hadley Reynolds of IDC, with Jeff Fried of Microsoft and Chris Lamb of the OpenCalais Initiative at Thomson Reuters, I wanted to give a high level view of semantic technologies currently in the marketplace. I contacted about a dozen vendors and selected six to highlight for the variety of semantic search offerings and business models.

One case study involves three vendors, each with a piece of the ultimate, customer-facing, product. My research took me to one company that I had reviewed a couple of years ago, and they sent me to their “customer” and to the customer’s customer. It took me a couple of conversations and emails to sort out the connections; in the end the relationships made perfect sense.

On one hand we have conglomerate software companies offering “solutions” to every imaginable enterprise business need. On the other, we see very unique, specialized point solutions to universal business problems with multiple dimensions and twists. Teaming by vendors, each with a solution to one dimension of a need, create compound product offerings that are adding up to a very large semantic search marketplace.

Consider an example of data gathering by a professional services firm. Let’s assume that my company has tens of thousands of documents collected in the course of research for many clients over many years. Researchers may move on to greater responsibility or other firms, leaving content unorganized except around confidential work for individual clients. We now want to exploit this corpus of content to create new products or services for various vertical markets. To understand what we have, we need to mine the content for themes and concepts.

The product of the mining exercise may have multiple uses: help us create a taxonomy of controlled terms, preparing a navigation scheme for a content portal, providing a feed to some business or text analytics tools that will help us create visual objects reflecting various configurations of content. A text mining vendor may be great at the mining aspect while other firms have better tools for analyzing, organizing and re-shaping the output.

Doing business with two or three vendors, experts in their own niches, may help us reach a conclusion about what to do with our information-rich pile of documents much faster. A multi-faceted approach can be a good way to bring a product or service to market more quickly than if we struggle with generic products from just one company.

When partners each have something of value to contribute, together they offer the benefits of the best of all options. This results in a new problem for businesses looking for the best in each area, namely, vendor relationship management. But it also saves organizations from dealing with huge firms offering many acquired products that have to be managed through a single point of contact, a generalist in everything and a specialist in nothing. Either way, you have to manage the players and how the components are going to work for you.

I really like what I see, semantic technology companies partnering with each other to give good-to-great solutions for all kinds of innovative applications. By the way, at the conference I am doing a quick snapshot on each: Cogito, Connotate (with Cormine and WorldTech), Lexalytics, Linguamatics, Sinequa and TEMIS.

Nstein Technologies Launches Semantic Site Search

Nstein Technologies Inc. announced the release of a new product, Semantic Site Search (3S). 3S leverages Nstein’s text-mining technology to power a faceted site search which returns results that are organized categorically. 3S can ingest content from many different indices from many different web publishing platforms, meaning it indexes material across multiple properties. 3S’ embedded Text Mining Engine (TME) identifies concepts, categories, proper names, places, organizations, sentiment and topics in particular content pieces and then annotates those documents, creating a semantic fingerprint that exposes underlying nuances and meaning in content. 3S is also boasts a visual interface that is designed to allow administrators to tweak search sensitivity algorithms without having to modify hard code. 3S comes bundled with front-end wiidgets which could be used to point users to “similar content”, “most recent content”, or other identifying characteristics of content that one wants to promote. http://www.nstein.com

Clarabridge Releases Clarabridge Enterprise 4

Clarabridge announced the general availability of Clarabridge Enterprise 4. Clarabridge Enterprise 4 includes the addition of an Ad-Hoc Uploader, upgrades to the Natural Language Processing (NLP) and Sentiment Engines, new collaboration tools in the Classification Suite and built-in Early Warnings and Alerts. Sentiment Engine Upgrades: clause-based sentiment and classification, along with a multitude of core engine enhancements; as well as added support for classifying data in foreign languages. Classification Templates to provide quick-start templates for analysts developing category models. Collaboration changes such as locking of models to prevent changes, rule history and roll back functionality, color-coding as a visual aid for maintaining models, and a preview feature. Early Warnings & Alerts: statistical warning and alert engines aimed at helping users proactively address customer experience issues by alerting them to anything that exceed defined thresholds. Ad-Hoc Uploader: The Ad-Hoc Uploader is designed to upload feedback sources for analysis directly from browsers. http://www.clarabridge.com/

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑