Curated for content, computing, and digital experience professionals

Category: Semantic technologies (Page 36 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.


About Analysts, Other Naysayers and Search

You have probably noticed a fair degree of skepticism among technology bloggers about search products and search add-ons to other products. There have been quite a few articles lately that generalize “search” into one monolithic group of technologies. You need to really read between the lines to find the “kind of” search that is meaningful to your “kind of” enterprise. If you go back to one of my first blog entries you’ll see I noted that “the market is, frankly, a real mess.” Sue Feldman has been clear that she, too, believes the field to be a “muddle.” Steve Arnold routinely lets us all know how Google’s patent filings suggest a path toward future disruptions in the search marketplace.

Why then would anyone invest in search today? Do it because there are certainly enough really good and appropriate solutions for most enterprises. These may not be in the price range you would like, and may require more overhead support to implement than you think you should need, but you cannot be paralyzed by what the “next big thing might be” because it might not happen for a really long time or at all. You may find a solution today that solves a lot of immediate searching problems for your enterprise and continues to evolve with the needs of your organization.

That said, you need to keep educating yourself and your peers. This article appeared in the paper version of Information Week Aug. 6, 2007 as The Ultimate Answer Machine, but in the online version as The Ultimate Search Engine. It conflates all kinds of search in a single message that the average non-expert buyer wouldn’t be able to sort out. In its product box you have 13 products including Web search and enterprise search mixed together with no differentiation. I’m tracking over 80 products that solve some kind of enterprise search problem and new ones come to my attention every week. Take a look at the online article and be sure to look at reader comments to see more diverse views than just mine. Think about whether what you read makes good business sense.

I could load you up with two dozen interesting articles from just the past week but recently these have caught my attention for more self-education. Check out: Fight Against Infoglut, in Information Week on April 7, 2007, Search technologies for the Internet by Henzinger, Monika. Science, pp. 468-471, 07/27/2007, and Enterprise Search – More than just Google, Analyst Perspectives Consensus Report (these last two are paid content).

Going forward, you will find some interesting perspectives on Oracle’s position in the enterprise search marketplace if you sit in on their Webinar, Oct. 10th. Finally, I hope you can attend the upcoming Gilbane Boston Conference, Nov. 27th – 29th. The case studies and panel discussions will have something for every type of solution. I’ll keep you posted on selected upcoming happenings as they appear; keep your ear to the ground and eyes focused.

Using Expertise and Enterprise Search to Drive New Knowledge

Drew Robb has written an excellent article that reflects new thinking about the convergence of older and emerging technologies. In Search Converging with Business Intelligence for CRM Daily.com on August 28, 2007 I see similarities to my own way of viewing what is possible with newer search tools. Be sure to read it.

In addition to enterprise search, I actively follow enterprise knowledge management. It has been much debated because of confusion about its links to inappropriate technologies and well-intentioned but costly and failed initiatives in many organizations. But, in spite of rumors about its death, KM will remain a boundless frontier of opportunity. At its best, it leverages collaborative and sharing practices to maximize the value of organizations’ discoveries, developments and learning using innovative and often simple practices that work because they suit a particular culture’s way of operating.

Popular writings about business and technology innovation, plus tools and techniques for collaboration and sharing abound. 2007 is surely the year when search has come to dominate the technology landscape as vendors in BI, text mining and text analytics, data management, and countless semantic and Web 2.0 entrants vie to add refinements, and conversely search integrates features from those technologies.

In August I commented on search offerings that have made a point of highlighting their “Sharepoint” connectivity. Similarly, many products are adding claims for exploiting emails. I have long assumed that email should be part of the search engine crawling and indexing mix for any intranet. Given email structure, it seems to have more useful and usable metadata than a lot of other content. Social network analysis tools have been terrific at revealing fascinating relationships and internal communications within organizations, especially in the discovery area as emails have been a source for exploring questionable business practices in legal proceedings.

More sophisticated analytic and semantic techniques for exploiting concepts in content give hints about how technologies can integrate content by mapping experts to the expertise they contribute, even when it is scattered throughout their work, including emails where so many nuggets may reside. An area for development would correlate nuggets of knowledge in emails to reveal hidden and latent expertise, pointing to other content an individual has produced using search with BI and analytics algorithms.

Maybe I’m overreaching but I suspect that a lot of experts may not be sufficiently motivated, disciplined or expected to aggregate their small but useful contributions into more valuable knowledge. Regardless of the reasons for that failure, much could be revealed with the right blending of search, indexing, analytics and business intelligence technologies. The components already exist but implementation to get desired results are not necessarily easy to deploy. A truly innovative expertise exploitation engine would be a knowledge engine of note, able to synthesize new knowledge in unique and interesting ways. Historically, much has been made about the role of serendipity in the “search for truth” and “quest for knowledge.” With the aid of enhanced search technologies to blend any or many expert nuggets, a lot more serendipity might happen.

Mobile Search, Etc. and Early Adoption

It was inevitable that the search market would rapidly expand into the mobile device market and the number of new and established vendors with options for your cell phone or PDA is a daily feast of reading. Combined with other new search options promising voice-enabled search, semantic search, search federating internal with Web and deep Web content, the possibilities for having search served up to suit anyone seem endless.

Tracking over 70 vendors with enterprise search offerings is plenty for me to focus on at the moment. However, as I read the publicity releases and descriptions inviting me to be briefed, in the back of my mind I know that business travelers need access to all kinds of enterprise content regardless of their locale. I worry that my skepticism about mobile search for enterprise content says more about my age than the technology. I might be thinking about struggling with tiny screens and buttons in less than optimal viewing conditions. But I am open to the possibilities and know it will become pervasive, as will all the other flavors of search.

Caution: before jumping into hot new offerings for enterprise search or any other technology, take a deep breath and think about some consequences of being that early adopter.

Security models for enterprise search in which content is being aggregated or federated from across numerous structured and unstructured content repositories is a very big issue, takes lots of planning, mapping, and time to deploy and test. Add the considerations of a start-up vendor dealing with your precious knowledge assets in a wireless world and you might think twice.

Recent CIO Research, CIO Vendor Report Card results for 52 top IT vendors broke out statistics in several areas. The responding audience ranked as their highest priorities two that would be very hard to judge for a new offering:

  • (Vendor) Delivers on Promises
  • Ongoing Support after the Sale and Implementation

Another statistic that was reported in the CIO Vendor Report Card was a measure of the likelihood that the respondent would be willing to recommend the vendor. This reminds us of a critical issue that early adopters of new products and technologies can’t easily resolve, finding recommenders or product case studies.

I don’t want to pick on mobile search especially; there is so much hot stuff coming down the pipeline that it is easy to get carried away. But a sign to take it slowly is when your own enterprise is struggling to keep up with new releases of products from known companies, and lags behind in fully exploiting technology you already have. The hottest “must haves” won’t even make it through initial deployment in this environment. Making certain that enterprise search is working well within the organization is a necessary and critical first step. Having a vendor with a track record and happy customers is a close second.

Sharepoint and Search

Sharepoint repositories are a prime content target for most search engines in the enterprise search arena, judging from the number of announcements I’ve previewed from search vendors in the last six month. This list is long and growing (Names link to press releases or product pages for Sharepoint search enabling):

Almost a year ago I began using a pre-MOSS version of Sharepoint to collect documents for a team activity. Ironically, the project was the selection, acquisition, implementation of a (non-Sharepoint) content management system to manage a corporate intranet, extranet, and hosted public Web site. The version of Sharepoint that was “set up” for me was strictly out of the box. Not being a development, I was still able to muddle my way through setting up the site, established users, posting announcements and categories of content to which I uploaded about fifty or sixty documents.

The most annoying discovery was the lack of a default search option. Later updating to MOSS solved the problem but at the time it was a huge aggravation. Because I could not guarantee a search option would appear soon enough, I had to painstakingly create titles with dates in order to give team members a contextual description as they would browse the site. Some of the documents I wanted to share were published papers and reviews of products. Dates were not too relevant for those, so I “enhanced” the titles with my own notations to help the finders select what they needed.

These silly “homemade” solutions are not uncommon when a tool does not anticipate how we would want to be able to use it. They persist as ways to handle our information storage and retrieval challenges. Since the beginning of time humans have devised ways to store things that they might want to re-use at some point in the future. Organizing for findability is an art as much at it is science. Information science only takes one so far in establishing the organizing criteria and assigning those criteria to content. Search engines that rely strictly on the author’s language will leave a lot of relevant content on the shelf for the same reasons as using Dewey Decimal classification without the complementary card catalog of subject topics. The better search engines exploit every structured piece of data or tagged content associated with a document, and that includes all the surrounding metadata assigned by “categorizers.” Categorizers might be artful human indexers or automated processes. Search engines with highly refined, intelligent categorizers to enable semantically rich finding experiences bring even more sophistication to the search experience.

But back to Sharepoint, which does have an embedded search option now, I’ve heard more than one expert comment on the likelihood that it will not be the “search” of choice for Sharepoint. That is why we have so many search options scrambling to promote their own Sharepoint search. This is probably because the organizing framework around contributing content to Sharepoint is so loosey goosey that an aggregation of many Sharepoint sites across the organization will be just what we’ve experienced with all these other homegrown systems – a dump full of idiosyncratic organizing tricks.

What you want to do, thoughtfully, is assess whether the search engine you need will share only Sharepoint repositories OR both structured and unstructured repositories across a much larger domain of types of content and applications. It will be interesting to evaluate the options that are out there for searching Sharepoint gold mines. Key questions: Is a product targeting only Sharepoint sites or diverse content? How will content across many types of repositories be aggregated and reflected organized results displays? How will the security models of the various repositories interact with the search engine? Answering these three questions first will quickly narrow your list of candidates for enterprise search.

The Marginal Influence of E-commerce Search and Taxonomies on Enterprise Search Technologies

As we gear up for Gilbane Boston 2007, the number of possible topics to include in the tracks related to search seems boundless. The search business is in a transitional state but in spite of disarray is still pivotal in its impact on business and current culture. The sessions will reflect the diversity in the market.

One trend is quite clear; the amount of money and effort being expended for Web search or site search on commercial Web sites is a winner in the “search technology” revenues war with annual revenues measuring well into the $billions. On the other hand, a recent Gartner study described the 2006 revenues for enterprise search as below $400M. This figure comes from reading an excellent article, Enterprise Search: Seek and Maybe You’ll Find, by Ben DuPont in Intelligent Enterprise. Check it out.

The distinctions between search on the Web and search within the enterprise are numerous but here are two. First, Internet Web search revenue is all about marketing. Yes, we use it to discover, learn, find facts, and become more informed. But when companies supplying search technology to expose you to their content on the Internet they do so to facilitate commerce. If it falls into the hands of organizations that have other intent, libraries or government agencies, so be it.

As we all know, when we are at work, seeking to discover, learn or find facts to do our jobs better, we need a different kind of search. Thus, we seek a clear search winner built just for our enterprise with all of its idiosyncrasies. The problem is that what is inside does not look like the rest of the world’s content as it is aggregated for commercial views. Enterprises are unique and operate sometimes chaotically, or, at best, with nuanced views of what information is most important.

The second distinction relates to taxonomies, and the increase in their development and use. I’ve seen a dramatic increase in job postings for “taxonomists” and have managed several projects for enterprises over the years to build these controlled lists of terms for categorizing content. What is noteworthy about recent job opportunities is that most seem to be for customer facing Web sites. Historically, organizations with substantial internal content (e.g. research reports, patents, laboratory findings, business documents) hired professionals to categorize materials for a narrowly defined audience of specialists. The terminology was often highly unique, could number in the hundreds or thousands of terms, even for a relatively small enterprise. This is no longer a common practice.

Slow financial growth in enterprise search markets is no surprise. Like many tools designed and marketed for departments not directly tied to revenue generation, search goes begging for solid vertical markets. Search’s companion technologies are also struggling to find a lucrative toehold for use within the organization. Content management systems integrated with rich and efficient taxonomy building and maintenance functions are hard to find.

I am confident that tools in CMS products for building and maintaining complex taxonomies will not improve until enterprises find a solid business reason to put professional human resources into doing content management, taxonomy development, search, and text analytics on their most important knowledge assets. This is a tough business proposition compared to the revenues being driven on the Internet. What businesses need to keep in mind is that without the ability to leverage their internal knowledge content assets better, smarter and faster, there won’t be innovative products in the pipeline to generate commerce. Losing track of your valuable intellectual resources is not a good long term strategy. Once you begin committing to solid content resource management strategies, enterprise technology products will improve to meet your needs.

Search and Need

Since an attempt to parse, in the simplest terms, the “enterprise search” market in January, I have been exposed to no less than 77 products and vendors whose offerings have been brought to my attention. Add to that another 20 or 30 peripheral offerings in the text mining and text analytics sphere and you’ll understand why the need for a focused view when considering products.

Selling and marketing at its best sells to a need. Need expresses something about users, user behaviors, user requirements, and problems to be solved. Need also implies emotions and that may present a problem when it comes to making business decisions.

Nothing plays into emotional business decisions like money, as illustrated by one IT manager’s reaction to this week’s Yahoo News story about Google offering its search appliance for small Web sites for $100 for up to 5,000 pages. Noting that $500/year would support up to 50,000 Web pages, he thought it could be a solution for the company’s intranet. In a tough budget situation it seemed to make sense because the maintenance fee for current search software far exceeds $500.

Let’s be clear, Google is offering site search for a Web site on the World-wide Web, not internal enterprise sites. There is a huge difference in the number of variables to be considered not the least of which are:

  1. Who is authoring and maintaining the target content, and what do they expect to have the search engine do with the tags and content?
  2. Who are the users, what are they looking for, and how do they expect it to be displayed?
  3. What is the software providing in the way of managing and supporting metadata
  4. Where is the software going to run and be maintained?
  5. What are the security and authorization considerations?
  6. What about all the internal content that is not “Web pages” (e.g. PDFs, spreadsheets, slide shows, images) with their associated metadata that may not be supported in this license but are fundamental to an enterprise search solution
  7. What do page ranking and ad management have to do with internal search requirements?

Just to be clear, there are other solutions that may come with levels of Web site search support that are more suited to many small organizations, internal and external. This week I learned more about one such offering, PicoSearch that has options from free to very reasonable monthly charges bundled with service for hosting search for an organization’s content. It can also provide some levels of password protection and security controls. This may not be an optimal choice for organizations with complex and multi-faceted search interfaces but could be perfect for associations, educational institutions, and small businesses with straightforward product lines.

Keep in mind, inexpensive does not mean “cheap” and it is also not the first qualifying criteria for what is “appropriate.”

FAST Connector for Microsoft Office SharePoint Server 2007 Available

Microsoft Corp. (NASDAQ:MSFT) and Fast Search & Transfer (FAST) announced the availability of new technologies in the FAST Enterprise Search Platform (FAST ESP) that integrate with and extend the enterprise search capabilities of Microsoft Office SharePoint Server 2007. In addition to supporting 80 languages, FAST ESP will enable Office SharePoint Server’s users to employ advanced navigation capabilities to refine results and find facts, using FAST’s Contextual Insight technology. It also uses Office SharePoint Server as a platform, extending Web Parts capabilities to expose customers and industry partners to more relevant information. http://www.microsoft.com/, http://www.fastsearch.com/

Respect for Complexity and Security are Winners

I participated in one search vendor’s user conference this week, and a webinar sponsored by another. Both impressed me because they expressed values that I respect in the content software industry and they provided solid evidence that they have the technology and business delivery infrastructure to back up the rhetoric.

You have probably noted that my blog is slim on comments about specific products and this trend will continue to be the norm. However, in addition to the general feeling of good will from Endeca customers that I experienced at Endeca Discover 2007, I heard clear messages from sessions I attended that reinforced the company’s focus on helping clients solve complex content retrieval problems. Complexity is inherent in enterprises because of diversity among employees, methods of operating, technologies deployed and varied approaches to meeting business demands at every level.

In presentations by Paul Sonderegger and Jason Purcell care was given to explain Endeca’s approach to building their search technology solutions and why. At the core is a fundamental truth about how organizations and people function; you never know how a huge amount of unpredictably interconnected stuff will be approached. Endeca wants its users to be able to use information as levers to discover, through personalized methods, relationships among content pieces that will pry open new possibilities for understanding the content.

Years ago I was actively involved with a database model called an associative structural model. It was developed explicitly to store and manipulate huge amount of database text and embodied features of hierarchical, networked and relational data structures.

It worked well for complex, integrated databases because it allowed users to manipulate and mingle data groups in unlimited ways. Unfortunately, it required a user to be able to visualize the possibilities for combining and expressing data from hundreds of fields in numerous tables by using keys. This structural complexity could not easily be taught or learned, and tools for simple visualization were not available in the early 1980s. As I listened to Jason Purcell describe Endeca’s optimized record store, and concept of “intra-query” to provide solutions for the problems posed by double uncertainty I thought, “They get it.” They have acknowledged the challenge of making it simple to update, use and exploit vast knowledge stores; they are working hard to meet the challenge. Good for them! We all want flexibility to work the way we want but if it is not easy we will not adopt.

In a KMWorld webinar, Vivisimo’s Jerome Pesente and customer Arnold Verstraten of NV Organon co-presented with Matt Brown of Forrester Research. The theme was search security models. Besides the reasons for up-front consideration for security when accessing and retrieving from enterprise repositories, three basic control models were described. All three were based on access control lists (ACLs), how and why they are used by Vivisimo.

Having worked with defense agencies, defense contractors and corporations with very serious security requirements on who can access what, I am very familiar with the types of data structures and indexing methods that can be used. I was pleased to hear the speakers address trade-offs that include performance and deployment issues. It served to remind me that organizations do need to be thinking about this early in the selection process; inability to handle the most sensitive content appropriately should eliminate any enterprise search vendor that tries to equivocate on security. Also, as Organon did, there is nothing that demonstrates the quality of the solution like a “bake-off” against a sufficient corpus of content that will demonstrate whether all documents and their metadata that must not be viewed by some audiences in fact are always excluded from search results for all in those restricted audiences. Test it. Really test it!

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑