Category: Semantic technologies (Page 28 of 72)

Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.

Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.

From the FastForward Blogger: A Microsoft User Group Meeting

February 19, 2009 / Lynda Moulton / 0 Comments

I was at FastForward last week, invited to be a participant in a panel of bloggers on the last day, tasked to react to three days of executive, partner and customer presentations to the FAST search user community. Four of us had more ideas than we could share in a 30 minute panel session. The other three fellows on the panel are regular bloggers on FastForward. Along with them, I had the pleasure of listening to and speaking with numerous other industry analysts and commentators over the three-day period in the “blogger/analyst lounge” where we gathered between sessions.

Before making some observations of my own, I will introduce you to a few of the folks who have had and will continue to have a continuing presence in the content and search arena, particularly as it relates to social tools and knowledge management, two tightly connected areas of interest.

Each of us was interviewed for a kind of video blog session during the meeting. Although you can’t view the panel from the final keynote session, I can share these links that will give you an idea of what my cohorts were thinking about the meeting and the state of FastForward in 2009. They are:

Jon Husband, social computing thought leader and architect. He has coined the term “wirearchy,” which aptly describes a flow of connectedness over the wires (and wireless) air waves. I really liked his observations about how social technologies encourage self-organizing around issues and make group action so much easier. His interview is a good listen and his blog is fun, too.
Jevon MacDonald, founder of Firestoker and FASTforward blog contributor, had some helpful comments in his interview about the usefulness of social media in aiding companies to be more responsive to their customers.
Euan Semple, independent advisor on social computing, elevated the discussion in favor of social tools improving the flow of knowledge, which is really the point of all this content and search related technology, as far as I am concerned. You’ll enjoy the interview with Euan in which he also comments on the ratio of men to women and the IT-centric audience at the meeting, something I observed, as well.
I was also interviewed by Josh-Michéle Ross and my thoughts dovetailed with the others in keeping with the social them of “engage your user,” the conference tag line. My mantra throughout the conference and after is that there was just not enough emphasis on how teams work together to build highly functional and easy-flowing search experiences for users. The process of creating a social platform in which search is present in subtle ways that assist connectedness among experts and their content requires human design; this is an art that can’t be left to “out-of-the-box” installed technologies. It is a task for those with an aptitude for what users really want, need and will use without being force-fed or artificially manipulated. Here are my comments in the interview.

Other interviews of interest can be found at the FastForward Bloggers page where a lot of thought leaders including Rob Paterson, Bill Ives, Clay Shirky, Charlene Li and Jim McGee among many others put forth some thoughtful comments about the state of technology.

Our panel moderator was, Perry Solomon, VP Business Development and General Manager, Worldwide Media Solutions – FAST. While on the big stage we did not get to speak on all the ideas he asked about in our preparatory session, I can bring them to light in the following. Solomon asked these questions followed by my thoughts after a few days to digest the meeting:

Q: How was the meeting balance in terms of search technology versus use?
LWM: The use cases were compelling and well presented. They were highly evocative of the best applications we can achieve with technology using all the social tools and content management options now available. This is appropriate in keynote/big theatre presentations but what I did not find in the few breakout sessions was more about the “nuts and bolts” of the human design and understanding needed to integrate components.

Among the attendees that I met during meals (system integrator partners from small firms and IT people who were struggling to build applications their internal customers wanted), there was a sense that not enough substantive information was being shared. They had hoped for more “how to” and concrete case studies that described the process of getting from purchasing licenses to deploying solutions. When I suggested to some of these Microsoft customers that it might be helpful to have more of their content managers and search administrators in the audience, they all agreed. None carried an attitude that they were going to design and implement these highly sophisticated content/search solutions with just members of the IT department. Business users were also notably absent from the meeting.

Q: What was the impact of the announcement about product news, FastSearch for Internet Business and FastSearch for SharePoint?
LWM: My own reaction was that it was a logical way to begin to roll out the FAST product with existing and evolving Microsoft products. It was not surprising, revolutionary or exciting. MS is clearly committed to making something of its huge investment in FAST; to align it with the rapidly evolving and highly popular SharePoint is smart business. The sentiment of others I spoke with was pretty much the same, sprinkled with a fair amount of skepticism about schedules for delivery and how well the products will be supported with services and documentation. Cost of ownership is always a big worry; what it will take to get the sizzle and super search results from this technology without a huge amount of human investment and skill on the part of customers or third-party integrators stimulates a deep “wait-and-see” attitude among most.

Q: What was missing or not addressed in the sessions?
LWM: The lack of presentations and involvement of non-IT people. While MS is highly responsive to the IT person’s desire for standardizing on a full-function platform and set of tools from a single supplier, this is not the reality in the marketplace. Content is created, manipulated and re-purposed with hundreds of applications that are used by business owners and content managers who bring a deep understanding of what needs to be applied to get the “social” workflow operational and productive in any given culture. My own bias is that the subtleties of organizational culture are often lost on many in IT but are more understood by those deeply immersed in engagement with both experts and their content. A “user-group” meeting must include these “others” and have sessions that support their professional interests so they come away learning substantive stuff from those others in similar situations.

Although “search” was the nominal reason for the meeting, there was no discussion about what it takes to get to the ultimate “user-engagement.” Search remains, “smoke and mirrors.” Search behind the firewall was still pretty thin as a concept and the emphasis was on e-commerce and monetization. There was a lot of talk about business & customer experiences engaging with search but not much substance as to how to actually create rich search experiences.

Q: What are we going to be talking about a year from now?
LWM: I hope the engagement will provide less visionary “hype,” which is not real high-value for the audience in heavy doses. If the meeting becomes more about getting customers to a successful outcome through the engagement of teams with IT, developers, content and business owners coming to a problem using a thoughtful design approach, attendees will leave with a higher commitment to embrace the technology.

Finally, I believe that, as FastSearch solutions are implemented and tested, customers will come to these meetings with higher expectations for helpful case studies that talk about “how the sausage is made,” the role of connectors and the actual tuning for higher relevancy. Much reference to search federation will give way to what federation really is and its many tiers of sophistication. Presentation of search results in ways that are compelling and trustworthy for users will need to be explained in more substantive sessions. I hope that we will be talking about social team interaction for implementing compelling search technology experiences for users.

Native Database Search vs. Commercial Search Engines

February 3, 2009 / Lynda Moulton / 2 Comments

This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.

The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.

To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today’s content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records.

This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean “and” or “or” query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.

So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.

Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:

Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.
Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.
Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.
Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.
Social features that support interaction among users and personalization options must be built.
Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.

Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.

XML in Everyday Things

January 30, 2009 / dwaldt / 0 Comments

If you didn’t follow the link below to Bob DuCharme’s response to my January 13 posting on Why it is Difficult to Include Semantics in Web Content, you should read it. Bob does a great job describing tools in use to include semantics in Web content. Bob is a very smart guy. I like to think the complexity of his answer is a good illustration of my point that adding semantics is not easy. Anyway, his response is clearly worth reading and can be found at http://www.snee.com/bobdc.blog/2009/01/publishers-and-semantic-web-te.html.

Also, I have known Bob for some time. I am reminded that a while back he wrote an interesting article about XML data produced by his TiVo device (see http://www.xml.com/pub/a/2006/02/15/hacking-the-xml-in-your-tivo.html). I was intrigued how XML had begun to pop up in everyday things.

Ever since that TiVo article, I think of Bob every time XML pops up in unexpected everyday places (it’s better than associating him with a trauma). Once in a while I get a glimpse of XML data in a printer control file, in Web page source code, or as an export format for some software, but that sort of thing is to be expected. We all have seen examples at work or in commercial settings, but to find XML data at home in everyday devices and applications has always warmed my biased heart.

Recently I was playing a game of Sid Meier’s Civilization IV (all work and no play and so on….) and I noticed while it was booting up a game that one of the messages said “Reading XML FIles”. My first thought was “Bob would like to see this!” Then I was curious to see how XML was being used in game software. A quick Google search and the first entry, from Wikipedia (http://en.wikipedia.org/wiki/Civilization_IV#cite_note-10), says “More game attributes are stored in XML files, which must be edited with an external text editor or application.” Apparently you can “tweak simple game rules and change or add content. For instance, they can add new unit or building types, change the cost of wonders, or add new civilizations. Players can also change the sounds played at certain times or edit the play list for your soundtrack.”

I poked around in the directories and found schemas describing game units, events, etc. and configuration data instances describing artifacts and activities used in the game. A user could, if they wanted to, make buying a specific building very cheap for instance, or have the game play their favorite music instead of what comes with the game. That is if they know how to edit XML data. I think I just found a way to add many hours of enjoyment to an already great game.

I wonder how much everyday XML is out there just waiting for someone to tweak it and optimize it to make something work better. A thermostat, a refrigerator, or a television perhaps.

Churning in the Search Sector – Two BIG Events in One Week

January 26, 2009 / Lynda Moulton / 0 Comments

Analysts having been projecting major consolidation in the enterprise search marketplace for a couple of years. What is interesting to me is how slowly this is evolving. For every merger or acquisition, whether small or large (acquisition of Mondosoft by SurfRay or FAST by Microsoft), other companies emerge or evolve with diverse and potentially competitive technologies (e.g. Attivio, Connotate, Expert System, EyeAlike, Truevert, Temis).

We have seen companies like Exalead, ISYS, and Vivisimo gain on former leaders. Microsoft is often listed as an industry leader because it acquired former leader FAST while companies with solid products for verticals, like Recommind in law and financial services, are often overlooked because they lack the total company revenues of a Microsoft that sells a lot more software than enterprise search.

This past week two industry news items caused me to reflect on the potential impact of announcements that, while not surprising, can upset the plans of buyers of search technology. The first was the announcement that Autonomy is planning to procure Interwoven. That Interwoven is being acquired is no surprise, since the company was being groomed for acquisition. However, this appears to be the first instance of a “search” company acquiring a “content management/document management” company. The norm has been that search companies get bought to fill a need by ECM or CMS vendors. For anyone planning to procure Interwoven because of its embedded Vivisimo Velocity for Universal search in its Worksite product, this does put a wrinkle in the fabric. What a shame because it is going to be a while before the actual impact is really known and could slow sales. The cost to buyers having to accept Autonomy’s IDOL instead of Velocity could be significant. The effect could be on both licensing and deployment because Velocity has been an efficient install for most enterprises. Autonomy has got a big ramp up to shift from being a search company to becoming an ECM supplier and some will take a wait and see attitude, regardless of the Idol reputation.

The second big announcement, of course, is the departure from Microsoft of John Marcus Lervik, a co-founder of FAST and recently named Executive VP in a newly created position for Enterprise Search at Microsoft. I’m sure you’ll be seeing plenty about the reasons elsewhere. However, the difficulty for those buyers who are depending on FAST’s search technology to be integrated sooner rather than later in Microsoft’s offerings has just been made more complicated as one of the original leaders of FAST is leaving the team.

Two years ago I commented to FAST executives about the need for vendors on a rapid growth path to make the buying, business and support experience for customers a priority, beyond technology enhancements; so, I take little consolation in seeing this turmoil. If you are a buyer, take a good hard look behind the technology to see what else you will be dealing with as you make plans to acquire software.

Taxonomy and Glossaries for Enterprise Search Terminology

January 21, 2009 / Lynda Moulton / 0 Comments

Two years ago when I began blogging for the Gilbane Group on enterprise search, the extent of my vision was reflected in the blog categories I defined and expected to populate with content over time. They represented my personal “top terms” that were expected to each have meaningful entries to educate and illuminate what readers might want to know about search behind the firewall of enterprises.

A recent examination of those early decisions showed me where there are gaps in content, perhaps reflecting that some of those topics were:

Not so important
Not currently in my thinking about the industry
OR Not well defined

I also know that on several occasions I couldn’t find a good category in my list for a blog I had just written. Being a former indexer and heavy user of controlled vocabularies, on most occasions I resisted the urge to create a new category and found instead the “best fit” for my entry. I know that when the corpus of content or domain is small, too many categories are useless for the reader. But now, as I approach 100 entries, it is time to reconsider where I want to go with blogging about enterprise search.

In the short term, I am going to try to provide entries for scantily covered topics because I still think they are all relevant. I’ll probably add a few more along the way or perhaps make some topics a little more granular.

Taxonomies are never static, and require periodic review, even when the amount of content is small. Taxonomists need to keep pace with current use of terminology and target audience interests. New jargon creeps in although I prefer to use generic and terms broadly understood in the technology and business world.

That gives you an idea of some of my own taxonomy process. To add to the entries on terminology (definitions) and taxonomies, I am posting a glossary I wrote for last year’s report on the enterprise search market and recently updated for the Gilbane Workshop on taxonomies. While the definitions were all crafted by me, they are validated through the heavy use of the Google “define” feature. If you aren’t already a user, you will find it highly useful when trying to pin down a definition. At the Google search box, simply type define: xxx xxx (where xxx represents a word or phrase for which you seek a definition). Google returns all the public definition entries it finds on the Internet. My definitions are then refined based on what I learn from a variety of sources I discover using this technique. It’s a great way to build your knowledge-base and discover new meanings.

Glossary Taxonomy and Search-012009

Open Source Search & Search Appliances Need Expert Attention

January 15, 2009 / Lynda Moulton / 0 Comments

Search in the enterprise suffers from lack of expert attention to tuning, care and feeding, governance and fundamental understanding of what functionality comes with any one of the 100+ products now on the market. This is just as true for search appliances, and open source search tools (Lucene) and applications (Solr). But while companies licensing search out-of-the-box solutions or heavily customized search engines have service, support and upgrades built-in into their deliverables, the same level of support cannot be assumed for getting started with open source search or even appliances.

Search appliances are sold with licenses that imply some high level of performance without a lot of support, while open source search tools are downloadable for free. As speakers about both open source and appliances made perfectly clear at our recent Gilbane Conference, both come with requirements for human support. When any enterprise search product or tool is selected and procured, there is a presumed business case for acquisition. What acquirers need to understand above all else is the cost of ownership to achieve the expected value. This means people and people with expertise on an ongoing basis.

Particularly when budgets are tight and organizations lay off workers, we discover that those with specialized skills and expertise are often the first to go. The jack-of-all-trades, or those with competencies in maintaining ubiquitous applications are retained to be “plugged in” wherever needed. So, where does this leave you for support of the search appliance that was presumed to be 100% self-maintaining, or the open source code that still needs bug fixes, API development and interface design-work?

This is the time to look to system integrators and service companies with specialists in tools you use. They are immersed in the working innards of these products and will give you better support through service contracts, subscriptions or labor-based hourly or project charges than you would have received from your in-house generalists, anyway.

You may not see specialized system houses or service companies listed by financial publications as a growth business, but I am going to put my confidence in the industry to spawn a whole new category of search service organizations in the short term. Just-in-time development for you and lower overhead for your enterprise will be a growing swell in 2009. This is how outsourcing can really bring benefits to your organization.

Post-post note – Here is a related review on the state-of-open source in the enterprise: The Open Source Enterprise; its time has come, by Charles Babcock in Information Week, Nov. 17, 2008. Be sure to read the comments, too.

Dale Waldt on semantic markup

January 9, 2009 / Frank Gilbane / 0 Comments

As I mentioned in November, Dale Waldt has joined us as a senior consultant, and yesterday he started posting on the XML blog. Check out his first post on “Why Adding Semantics to Web Data is Difficult“.

Why Adding Semantics to Web Data is Difficult

January 8, 2009 / dwaldt / 2 Comments

If you are grappling with Web 2.0 applications as part of your corporate strategy, keep in mind that Web 3.0 may be just around the corner. Some folks say a key feature of Web 3.0 is the emergence of the Semantic Web where information on Web pages includes markup that tells you what the data is, not just how to format it using HTML (HyperText Markup Language). What is the Semantic Web? According to Wikipedia:

“Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price on a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web.” (http://en.wikipedia.org/wiki/Semantic_Web).

To make this work, the W3C (World Wide Web Consortium) has developed standards such as RDF (Resource Description Framework, a schema for describing properties of data objects) and SPARQL (SPARQL Protocol and RDF Query Language, http://www.w3.org/TR/rdf-sparql-query/) extend the semantics that can be applied to Web delivered content.

We have been doing semantic data since the beginning of SGML, and later with XML, just not always exposing these semantics to the Web. So, if we know how to apply semantic markup to content, how come we don’t see a lot of semantic markup on the Web today? I think what is needed is a method for expressing and understanding the semantics intended to be expressed beyond what current standards capabilities allow

A W3C XML schema is a set of rules that describe the relationships between content elements. It can be written in a way that is very generic or format oriented (e.g., HTML) or very structure oriented (e.g., Docbook, DITA). Maybe we should explore how to go even further and make our markup languages very semantically oriented by defining elements, for instance, like <weight> and <postal_code>.

Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the <income> tag, but how, in a schema can I tell you it is a net <income> calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name <net_income> while another might use <inc>. Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread. (Note: no criticism of XBRL is intended, just using it as an example of how difficult the problem is).

Also, consider the tools in use to consume Web content. Browsers only in recent years added XML processing support in the form of the ability to read DTDs and transform content using XSLT. Even so, this merely allows you to read, validate and format non-HTML tag markup, not truly understand the content’s meaning. And if everyone uses their own schemas to define the data they publish on the Web, we could end up with a veritable “Tower of Babel” with many similar, but not fully interoperable data models.

The Semantic Web may someday provide seamless integration and interpretation of heterogeneous data. Tools such as RDF /SPARQL, as well as microformats (embedding small, specialized, predefined element fragments in a standard format such as HTML), metadata, syndication tools and formats, industry vocabularies, powerful processing tools like XQuery, and other specifications can improve our ability to treat heterogeneous markup as if it were more homogeneous. But even these approaches are addressing only part of the bigger problem. How will we know that elements labeled with <net_income> and <inc> are the same and should be handled as such. How do we express these semantic definitions in a processable form? How do we know they are identical or at least close enough to be treated as essentially the same thing?

This, defining semantics effectively and broadly, is a conundrum faced by many industry standard schema developers and system integrators working with XML content. I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools. There is some interesting work being done in Technical Architecture Group (TAG) at the W3C to address these issues as part of Tim Berners-Lee’s vision of the semantic Web (see for a recent paper on the subject).
Meanwhile, we have Web 2.0 social networking tools to keep us busy and amused while we wait. </>

Category: Semantic technologies (Page 28 of 72)

From the FastForward Blogger: A Microsoft User Group Meeting

Native Database Search vs. Commercial Search Engines

XML in Everyday Things

Churning in the Search Sector – Two BIG Events in One Week

Taxonomy and Glossaries for Enterprise Search Terminology

Open Source Search & Search Appliances Need Expert Attention

Dale Waldt on semantic markup

Why Adding Semantics to Web Data is Difficult

Subscribe to the Gilbane Advisor

Choose Language

Topics we cover

Policies

Contact