Webnodes has announced CMS to have dynamic support for Schema.org. The new feature has an intuitive vocabulary mapping user interface as well as a code API and Asp.Net controls to streamline the work for site developers. The Webnodes CMS ontology management user interface provides a separation between data, data model and presentation layout. Schema.org which is all about making search engines understand the meaning of your content is a natural extension to the semantic core engine. http://www.webnodes.com
Category: Semantic technologies (Page 22 of 72)
Our coverage of semantic technologies goes back to the early 90s when search engines focused on searching structured data in databases were looking to provide support for searching unstructured or semi-structured data. This early Gilbane Report, Document Query Languages – Why is it so Hard to Ask a Simple Question?, analyses the challenge back then.
Semantic technology is a broad topic that includes all natural language processing, as well as the semantic web, linked data processing, and knowledge graphs.
TEMIS, the provider of Text Analytics Solutions for the Enterprise, today announced the launch of the next generation of Luxid, its flagship semantic content enrichment solution. Luxid 6 is a semantic tagging platform which automatically extracts relevant information (entities, topics, events, sentiments), identifies relationships residing in unstructured data and facilitates links between similar and related documents. Luxid 6 optimizes the management of Enterprise content through the capture and structuring of targeted information. The software also enhances the utilization of content within an Enterprise’s workflows such as competitive intelligence, research and innovation, voice of the consumer and reputation management. http://www.temis.com/
Today we highlight Workshop C: Justifying Enterprise Search: Mitigating Risk and Getting the Right Fit taking place at Gilbane Boston, November 29, 9:00am – 12:00pm at the Westin Waterfront.
While enterprise search has been debated, maligned, and challenged as a high value infrastructure application over the past decade, it has a place in every enterprise with valuable content. This presentation highlights how to make the right decisions about enterprise search applications. From embedded search to high-end semantic applications, the options are numerous and the technologies solid. However, the right choice is imperative and basing selection on business priorities requires artful analysis and justification. Illustrating the risks of continuing to operate with a faulty search solution is a good way to focus thinking about the search environment in any organization.
Instructor:
Lynda Moulton, Senior Analyst & Consultant, Outsell Gilbane Services
HP and Autonomy Corporation announced the terms of a recommended transaction under which HP will acquire all of the outstanding shares of Autonomy for £25.50 ($42.11) per share in cash. The transaction was unanimously approved by the boards of directors of both HP and Autonomy. The Autonomy board of directors also has unanimously recommended its shareholders accept the Offer. Based on the closing stock price of Autonomy on August 17, 2011, the consideration represents a one day premium to Autonomy shareholders of approximately 64 percent and a premium of approximately 58 percent to Autonomy’s prior one month average closing price. The transaction will be implemented by way of a takeover offer extended to all shareholders of Autonomy. A document containing the full details of the Offer will be dispatched as soon as practicable after the date of this release. The acquisition of Autonomy is expected to be completed by the end of calendar 2011. Founded in 1996, Autonomy is a provider of infrastructure software for the enterprise with a customer base of more than 25,000 global companies. Positions HP as leader in large and growing space‚Äî Autonomy has a strong position in the $20 billion enterprise information management space, which is growing at 8 percent annually and is uniquely positioned to continue growth within this space. Furthermore, key Autonomy assets would provide HP with the ability to reinvent the $55 billion business analytics software and services space, which is growing at 8 percent annually. Reasons for the acquisition were cited as‚Äî Complements HP’s existing technology portfolio; Provides differentiated IP for services and extensive vertical capabilities in key industries; Provides IPG a base for content management platforms; Enhances HP’s financial profile; as well as Autonomy should be accretive to HP’s earnings. http://www.hp.com/ http://www.autonomy.com/
Endeca Technologies, Inc., an agile information management software company, announced native integration of Endeca Latitude with Apache Hadoop. Endeca Latitude, based on the Endeca MDEX hybrid search-analytical database, is uniquely suited to unlock the power of Apache Hadoop. Apache Hadoop is strong at manipulating semi-structured data, which is a challenge for traditional relational databases. This combination provides flexibility and agility in combining diverse and changing data, and performance in analyzing that data. Enabling Agile BI requires a complete data-driven solution that unites integration, exploration and analysis from source data through end-user access that can adapt to changing data, changing data sources, and changing user needs. Solutions that require extensive pre-knowledge of data models and end-user needs fail to meet the agility requirement. The united Endeca Latitude and Apache Hadoop solution minimizes data modeling, cleansing, and conforming of data prior to unlocking the value of Big Data for end-users. http://www.endeca.com/ http://hadoop.apache.org/
Here we are, half way through 2011, and on track for a banner year in the adoption of enterprise search, text mining/text analytics, and their integration with collaborative content platforms. You might ask for evidence; what I can offer is anecdotal observations. Others track industry growth in terms of dollars spent but that makes me leery when, over the past half dozen years, there has been so much disappointment expressed with the failures of legacy software applications to deliver satisfactory results. My antenna tells me we are on the cusp of expectations beginning to match reality as enterprises are finding better ways to select, procure, implement, and deploy applications that meet business needs.
What follows are my happy observations, after attending the 2011 Enterprise Search Summit in New York and 2011 Text Analytics Summit in Boston. Other inputs for me continue to be a varied reading list of information industry publications, business news, vendor press releases and web presentations, and blogs, plus conversations with clients and software vendors. While this blog is normally focused on enterprise search, experiencing and following content management technologies, and system integration tools contribute valuable insights into all applications that contribute to search successes and frustrations.
Collaboration tools and platforms gained early traction in the 1990s as technology offerings to the knowledge management crowd. The idea was that teams and workgroups needed ways to share knowledge through contribution of work products (documents) to “places” for all to view. Document management systems inserted themselves into the landscape for managing the development of work products (creating, editing, collaborative editing, etc.). However, collaboration spaces and document editing and version control activities remained applications more apart than synchronized.
The collaboration space has been redefined largely because SharePoint now dominates current discussions about collaboration platforms and activities. While early collaboration platforms were carefully structured to provide a thoughtfully bounded environment for sharing content, their lack of provision for idiosyncratic and often necessary workflows probably limited market dominance.
SharePoint changed the conversation to one of build-it-to-do-anything-you-want-the way-you-want (BITDAYWTWYW). What IT clearly wants is single vendor architecture that delivers content creation, management, collaboration, and search. What end-users want is workflow efficiency and reliable search results. This introduces another level of collaborative imperative, since the BITDAYWTWYW model requires expertise that few enterprise IT support people carry and fewer end-users would trust to their IT departments. So, third-party developers or software offerings become the collaborative option. SharePoint is not the only collaboration software but, because of its dominance, a large second tier of partner vendors is turning SharePoint adopters on to its potential. Collaboration of this type in the marketplace is ramping wildly.
Convergence of technologies and companies is on the rise, as well. The non-Microsoft platform companies, OpenText, Oracle, and IBM are placing their strategies on tightly integrating their solid cache of acquired mature products. These acquisitions have plugged gaps in text mining, analytics, and vocabulary management areas. Google and Autonomy are also entering this territory although they are still short on the maturity model. The convergence of document management, electronic content management, text and data mining, analytics, e-discovery, a variety of semantic tools, and search technologies are shoring up the “big-platform” vendors to deal with “big-data.”
Sitting on the periphery is the open source movement. It is finding ways to alternatively collaborate with the dominant commercial players, disrupt select application niches (e. g. WCM ), and contribute solutions where neither the SharePoint model nor the big platform, tightly integrated models can win easy adoption. Lucene/Solr is finding acceptance in the government and non-profit sectors but also appeal to SMBs.
All of these factors were actively on display at the two meetings but the most encouraging outcomes that I observed were:
- Rise in attendance at both meetings
- More knowledgeable and experienced attendees
- Significant increase in end-user presentations
The latter brings me back to the adoption issue. Enterprises, which previously sent people to learn about technologies and products to earlier meetings, are now in the implementation and deployment stages. Thus, they are now able to contribute presentations with real experience and commentary about products. Presenters are commenting on adoption issues, usability, governance, successful practices and pitfalls or unresolved issues.
Adoption is what will drive product improvements in the marketplace because experienced adopters are speaking out on their activities. Public presentations of user experiences can and should establish expectations for better tools, better vendor relationship experiences, more collaboration among products and ultimately, reduced complexity in the implementation and deployment of products.
I continue to be impressed by the new ways in which enterprise search companies differentiate and package their software for specialized uses. This is a good thing because it underscores their understanding of different search audiences. Just as important is recognition that search happens in a context, for example:
- Personal interest (enlightenment or entertainment)
- Product selection (evaluations by independent analysts vs. direct purchasing information)
- Work enhancement (finding data or learning a new system, process or product)
- High-level professional activities (e-discovery to strategic planning)
Vendors understand that there is a limited market for a product or suite of products that will satisfy every budget, search context and the enterprise’s hierarchy of search requirements. Those who are the best focus on the technological strengths of their search tools to deliver products packaged for a niche in which they can excel.
However, for any market niche excellence begins with six basics:
- Customer relationship cultivation, including good listening
- Professional customer support and services
- Ease of system installation, implementation, tuning and administration
- Out-of-the box integration with complementary technologies that will improve search
- Simple pricing for licensing and support packages
- Ease of doing business, contracting and licensing, deliveries and upgrades
While any mature and worthy company will have continually improved on these attributes, there are contextual differentiators that you should seek in your vertical market:
- Vendor subject matter expertise
- Vendor industry expertise
- Vendor knowledge of how professional specialists perform their work functions
- Vendor understanding of retrieval and content types that contribute the highest value
At a recent client discussion the application of a highly specialized taxonomy was the topic. Their target content will be made available on a public facing web site and also to internal staff. We began by discussing the various categories of terminology already extracted from a pre-existing system.
As we differentiated how internal staff needed to access content for research purposes and how the public is expected to search, patterns emerged for how differently content needs to be packaged for each constituency. For you who have specialized collections to be used by highly diverse audiences, this is no surprise. Before proceeding with decisions about term curation and determining the granularity of their metadata vocabulary, what has become a high priority is how the search mechanisms will work for different audiences.
For this institution, internal users must have pinpoint precision in retrieval on multiple facets of content to get to exactly the right record. They will be coming to search with knowledge of the collection and more certainty about what they can expect to find. They will also want to find their target(s) quickly. On the other hand, the public facing audience needs to be guided in a way that leads them on a path of discovery, navigating through a map of terms that takes them from their “key term” query through related possibilities without demanding arcane Boolean operations or lengthy explanations for advanced searching.
There is a clear lesson here for seeking enterprise search solutions. Systems that favor one audience over another will always be problematic. Therefore, establishing who needs what and how each goes about searching needs to be answered, and then matched to the product that can provide for all target groups.
We are in the season for conferences; there are a few next month that will be featuring various search and content technologies. After many years of walking exhibit halls and formulating strategies for systematic research and avoiding a swamp of technology overload, I try now to have specific questions formulated that will discover the “must have” functions and features for any particular client requirement. If you do the same, describing a search user scenario to each candidate vendor, you can then proceed to ask: Is this a search problem your product will handle? What other technologies (e.g. CMS, vocabulary management) need to be in place to ensure quality search results? Can you demonstrate something similar? What would you estimate the implementation schedule to look like? What integration services are recommended?
These are starting points for a discussion and will enable you to begin to know whether this vendor meets the fundamental criteria laid out earlier in this post. It will also give you a sense of whether the vendor views all searchers and their searches as generic equivalents or knows that different functions and features are needed for special groups.
Look for vendors for enterprise search and search related technologies to interview at the following upcoming meetings:
Enterprise Search Summit, New York, May 10 – 11 […where you will learn strategies and build the skill sets you need to make your organization’s content not only searchable but “findable” and actionable so that it delivers value to the bottom line.] This is the largest seasonal conference dedicated to enterprise search. The sessions are preceded by separate workshops with in-depth tutorials related to search. During the conference, focus on case studies of enterprises similar to yours for better understanding of issues, which you may need to address.
Text Analytics Summit, Boston, May 18 – 19 I spoke with Seth Grimes, who kicks off the meeting with a keynote, asking whether he sees a change in emphasis this year from straight text mining and text analytics. You’ll have to attend to get his full speech but Seth shared that he see a newfound recognition that “Big Data” is coming to grips with text source information as an asset that has special requirements (and value). He also noted that unstructured document complexities can benefit from text analytics to create semantic understanding that improves search, and that text analytics products are rising to challenge for providing dynamic semantic analysis, particularly around massive amounts of social textual content.
Lucene Revolution, San Francisco, May 23 – 24 […hear from … the foremost experts on open source search technology to a broad cross-section of users that have implemented Lucene, Solr, or LucidWorks Enterprise to improve search application performance, scalability, flexibility, and relevance, while lowering their costs.] I attended this new meeting last year when it was in Boston. For any enterprise considering or leaning toward implementing open source search, particularly Lucene or Solr, this meeting will set you on a path for understanding what that journey entails.
A recent inquiry about a position requiring ETL (Extraction/Transformation/Loading) experience prompted me to survey the job market in this area. It was quite a surprise to see that there are many technical positions seeking this expertise, plus experience with SQL databases, and XML, mostly in healthcare, finance or with data warehouses. I am also observing an uptick in contract positions for metadata and taxonomy development.
My research on Semantic Software Technologies placed me on a path for reporters and bloggers to seek my thoughts on the Watson-Jeopardy story. Much has been written on the story but I wanted to try a fresh take on the meaning of it all. There is a connection to be made between the ETL field and building a knowledgebase with the smarts of Watson. Inspiration for innovation can be drawn from the Watson technology but there is a caveat; it involves the expenditure of serious mental and computing perspiration.
Besides baked-in intelligence for answering human questions using natural language processing (NLP) to search, an answer-platform like Watson requires tons of data. Also, data must be assembled in conceptually and contextually relevant databases for good answers to occur. When documents and other forms of electronic content are fed to a knowledgebase for semantic retrieval, finely crafted metadata (data describing the content) and excellent vocabulary control add enormous value. These two content enhancers, metadata and controlled vocabularies, can transform good search into excellent search.
The irony of current enterprise search is that information is in such abundance that it overwhelms rather than helps findability. Content and knowledge managers can’t possibly contribute the human resources needed to generate high quality metadata for everything in sight. But there are numerous techniques and technologies to supplement their work by explicitly exploiting the mountain of information.
Good content and knowledge managers know where to find top quality content but may not know that, for all common content formats, there are tools to extract key metadata embedded (but hidden) in it. Some of these tools can also text mine and analyze the content for additional intelligent descriptive data. When content collections are very large but too small to justify (under a million documents) the most sophisticated and complex semantic search engines, ETL tools can relieve pressure on metadata managers by automating a lot of mining, extracting entities and concepts needed for good categorization.
The ETL tool array is large and varied. Platform tools from Microsoft (SSIS) and IBM (DataStage) may be employed to extract, transform and load existing metadata. Other independent products such as those from Pervasive and SEAL may contribute value across a variety of platforms or functional areas from which content can be dramatically enhanced for better tagging and indexing. The call for ETL experts is usually expressed in terms of engineering functions who would be selecting, installing and implementing these products. However, it has to be stressed that subject and content experts are required to work with engineers. The role of the latter is to help tune and validate the extraction and transformation outcomes, making sure terminology fits function.
Entity extraction is one major outcome of text mining to support business analytics, but tools can do a lot more to put intelligence into play for semantic applications. Tools that act as filters and statistical analyzers of text data warehouses will help reveal terminology for use in building specialized controlled vocabularies for use in auto-categorization. A few vendors that are currently on my radar to help enterprises understand and leverage their content landscape include EntropySoft Content ETL, Information Extraction Systems, Intelligenx, ISYS Document Filters, RAMP, and XBS, something here for everyone.
The diversity of emerging applications is a leading indicator that there is a lot of innovation to come with all aspects of ETL. While RAMP is making headway with video, another firm with a local connection is Inforbix. I spoke with a co-founder, Oleg Shilovitsky for my semantic technology research last year before they launched. As he then asserted, it is critical to preserve, mine and leverage the data associated with design and manufacturing operations. This area has huge growth potential and Inforbix is now ready to address that market.
Readers who seek to leverage ETL and text mining will gain know-how from the cases presented at the 2011 Text Analytics Summit, May 18-19 in Boston. As well, the exhibits will feature products to consider for making piles of data a valuable knowledge asset. I’ll be interviewing experts who are speaking and exhibiting at that conference for a future piece. I hope readers will attend and seek me out to talk about your metadata management and text mining challenges. This will feed ideas for future posts.
Finally, I’m not the only one thinking along these lines. You will find other ideas and a nudge to action in these articles.
Boeri, Bob. Improving Findability Behind the Firewall, 28 slides. Enterprise Search Summit 2010, NY, 05/2010.
Farrell, Vickie. The Need for Active Metadata Integration: The Hard Boiled Truth. DM Direct Newsletter, 09/09/2005, 3p
McCreary, Dan. Entity Extraction and the Semantic Web, Semantic Universe, 01/01/2009
White, David. BI or bust? KMWorld, 10/28/2009, 3p.