Archive for Search & Semantic Tech

Speaker Spotlight: John Felahi – Making content findable

In another installment of Speaker Spotlight, we posed a couple of our frequently asked questions to speaker John Felahi, Chief Strategy Officer at Content Analyst Company, LLC. We’ve included his answers here. Be sure to see additional Speaker Spotlights from our upcoming conference.

John_Felahi-horiz

Speaker Spotlight: John Felahi

Chief Strategy Officer

Content Analyst Company, LLC

 

What is the best overall strategy for delivering content to web, multiple mobile, and upcoming digital channels? What is the biggest challenge? Development and maintenance cost? Content control? Brand management? Technology expertise?

One of the biggest challenges to delivering content to the web is making it as findable as possible to potential interested viewers.  While traditional, manual tagging and keyword search methods may have gotten us this far, and may be good enough for some use cases, they’re still not without limitations. The good news is, there are far more advanced, sophisticated – and automated – technologies available to remedy the numerous limitations of manual tagging content and keyword-based search. The limitations of manual tagging and keyword-based include:

  • Term creep – New terms constantly emerge, requiring taxonomies to be constantly updated.
  • Polysemy – Take Apple, for example. Is your user searching for the company, the Beatles’ record label, or the fruit?
  • Acronyms – Texting has introduced an entirely new language of acronyms (LOL, TTYL, WDYT).  Manually tagging content requires the editor to consider possible acronyms the users will be searching for.
  • Abbreviations – Tagging content with long, scientific terms, geographies, etc. require editors to factor these in along with the long terms they represent.
  • Misspellings – Thanks to spellcheck and autocorrect, technology has become much more forgiving for those who never made it past the first round eliminations in their sixth grade spelling bee. Content search, unfortunately, needs to be equally accommodating, if you want your users to find your content – which means tagging it with common misspellings.
  • Language – The web has certainly made the world a much smaller place, but that doesn’t mean everyone speaks English.  Making content findable in any language means it has to also be tagged in multiple languages.

On to the good news – there’s technology that’s been used for years in eDiscovery and the US Intelligence Community to overcome these very challenges, but for different reasons. Because the bad guys aren’t tagging their content to make it more findable, the intel community needs a better way to find what they’re looking for. And in eDiscovery, finding relevant content can make a multi-million dollar difference to the outcome of a particular litigation or other regulatory matter. That’s why tens of thousands of legal reviewers and countless analysts in the intel community use a technology known as concept-aware advanced analytics.

How concept-aware advanced analytics differs from manual tagging and keyword search

As its name implies, concept-aware understands the underlying concepts within the content. As such, it can tag content automatically.  On the viewer’s side, content can be found by simply saying, “find more like this.” Categories are defined by taking examples that represent the concepts of a category. The system “learns” what that category is all about, and can then identify conceptually similar content and apply the same category. The process is the same on the search side. The user points to a piece of content and says, “find more like this.” Or as the content publisher, you present the viewer with conceptually similar content, i.e., “you may also be interested in these articles.”

While concept-aware advanced analytics doesn’t necessarily replace manual tagging and keyword search – which work very well in certain situations – the technology clearly overcomes many of the limitations of traditional tagging and search methods.

Catch Up with John at Gilbane

Track E: Content, Collaboration, and the Employee Experience

E7: Strategic Imperatives for Enterprise Search to Succeed
Wednesday, December, 4: 2:00 p.m. – 3:20 p.m.

[button link=”http://gilbaneconference.com/program” variation=”red”]Complete Program[/button] [button link=”http://gilbaneconference.com/schedule” variation=”red”]Conference Schedule[/button] [button link=”http://gilbaneconference.com/registration” variation=”red”]Register Today[/button]

HP to Acquire Autonomy

HP and Autonomy Corporation announced the terms of a recommended transaction under which HP will acquire all of the outstanding shares of Autonomy for £25.50 ($42.11) per share in cash. The transaction was unanimously approved by the boards of directors of both HP and Autonomy. The Autonomy board of directors also has unanimously recommended its shareholders accept the Offer. Based on the closing stock price of Autonomy on August 17, 2011, the consideration represents a one day premium to Autonomy shareholders of approximately 64 percent and a premium of approximately 58 percent to Autonomy’s prior one month average closing price. The transaction will be implemented by way of a takeover offer extended to all shareholders of Autonomy. A document containing the full details of the Offer will be dispatched as soon as practicable after the date of this release. The acquisition of Autonomy is expected to be completed by the end of calendar 2011. Founded in 1996, Autonomy is a provider of infrastructure software for the enterprise with a customer base of more than 25,000 global companies. Positions HP as leader in large and growing space‚Äî Autonomy has a strong position in the $20 billion enterprise information management space, which is growing at 8 percent annually and is uniquely positioned to continue growth within this space. Furthermore, key Autonomy assets would provide HP with the ability to reinvent the $55 billion business analytics software and services space, which is growing at 8 percent annually. Reasons for the acquisition were cited as‚Äî Complements HP’s existing technology portfolio; Provides differentiated IP for services and extensive vertical capabilities in key industries; Provides IPG a base for content management platforms; Enhances HP’s financial profile; as well as Autonomy should be accretive to HP’s earnings. http://www.hp.com/ http://www.autonomy.com/

Endeca Now Integrates Hadoop

Endeca Technologies, Inc., an agile information management software company, announced native integration of Endeca Latitude with Apache Hadoop. Endeca Latitude, based on the Endeca MDEX hybrid search-analytical database, is uniquely suited to unlock the power of Apache Hadoop. Apache Hadoop is strong at manipulating semi-structured data, which is a challenge for traditional relational databases. This combination provides flexibility and agility in combining diverse and changing data, and performance in analyzing that data. Enabling Agile BI requires a complete data-driven solution that unites integration, exploration and analysis from source data through end-user access that can adapt to changing data, changing data sources, and changing user needs. Solutions that require extensive pre-knowledge of data models and end-user needs fail to meet the agility requirement. The united Endeca Latitude and Apache Hadoop solution minimizes data modeling, cleansing, and conforming of data prior to unlocking the value of Big Data for end-users. http://www.endeca.com/ http://hadoop.apache.org/

What’s Next with Smart Content?

Over the past few weeks, since publishing Smart Content in the Enterprise, I’ve had several fascinating lunchtime conversations with colleagues concerned about content technologies. Our exchanges wind up with a familiar refrain that goes something like this. “Geoffrey, you have great insights about smart content but what am I supposed to do with all this information?” Ah, it’s the damning with faint praise gambit that often signals an analysis paralysis conundrum for decision-making.

Let me make one thing perfectly clear — I do not have an out-of-the-box prescription for a solution. It’s not simply a matter of focusing on your customer experience, optimizing your content for search, investing in a component content management platform, or adopting DITA – although, depending on the situation, I may recommend some combination of these items as part of a smart content strategy.

For me, smart content remains a work in progress. I expect to develop the prescriptive road map in the months ahead. Here’s a quick take on where I am right now.

  • For publishers, it’s all about transforming the publishing paradigm through content enrichment – defining the appropriate level of granularity and then adding the semantic metadata for automated processing.
  • For application developers, it’s all about getting the information architecture right and ensuring that it’s extensible. There needs to be sensible storage, the right editing and management tools, multiple methods for organizing content, as well as a flexible rendering and production environment.
  • For business leaders and decision makers, there needs to be an upfront investment in the right set of content technologies that will increase profits, reduce operating costs, and mitigate risks. No, I am not talking about rocket science. But you do need a technology strategy and a business plan.

As highlighted by the case studies included in the report, I can point to multiple examples where organizations have done the right things to produce notable results. Dale and I will continue the smart content discussions at the Gilbane Boston conference right after Thanksgiving, both through our preconference workshop, and at a conference session “Smart Content in the Real World: Case Studies and Real Results.”

We are also launching a Smart Content Readiness Service, where we will engage with organizations on a consulting basis to identify:

  • The business drivers where smart content will ensure competitive advantage when distributing business information to customers and stakeholders
  • The technologies, tools, and skills required to componentized content, and target distribution to various audiences using multiple devices
  • The operational roles and governance needed to support smart content development and deployment across an organization
  • The implementation planning strategies and challenges to upgrade content and creation and delivery environments

Please contact me if you are interested in learning more.

In short, to answer my lunchtime colleagues, I cannot (yet) prescribe a fully baked solution. It’s too early for the recipes and the cookbook. But I do believe that the business opportunities and benefits are readily at hand. At this point, I would invite you to join the discussion by letting me know what you expect, what approaches you’ve tried, where you’ve wound up, what you think needs to come next – and how we might help you.

Sophia Launches Sophia Search for Intelligent Enterprise Search and Contextual Discovery

Sophia, the provider of contextually aware enterprise search solutions, announced Sophia Search, a new search solution which uses a Semiotic-based linguistic model to identify intrinsic terms, phrases and relationships within unstructured content so that it can be recovered, consolidated and leveraged. Use of Sophia Search is designed to minimize compliance risk and reduce the cost of storing and managing enterprise information. Sophia Search is able to deliver a “three-dimensional” solution to discover, consolidate and optimize enterprise data, regardless of its data type or domain. Sophia Search helps organizations manage and analyze critical information by discovering the themes and intrinsic relationships behind their information, without taxonomies or ontologies, so that more relevant information may be discovered. By identifying both duplicates and near duplicates, Sophia Search allows organizations to effectively consolidate information and minimizing storage and management costs. Sophia Search features a patented Contextual Discovery Engine (CDE) which is based on the linguistic model of Semiotics, the science behind how humans understand the meaning of information in context. Sophia Search is available now to both customers and partners. Pricing starts at $30,000. http://www.sophiasearch.com/

Open Text to Acquire Nstein

Open Text Corporation (NASDAQ:OTEX) (TSX: OTC) and Nstein Technologies Inc. (TSX-V: EIN) announced that they have entered into a definitive agreement by which Open Text will acquire all of the issued and outstanding common shares of Nstein through an Nstein shareholder-approved amalgamation with a subsidiary of Open Text under the Companies Act (Québec). Based on the terms of the definitive agreement, Nstein shareholders will receive for each Nstein common share, CDN $0.65 in cash, unless certain eligible shareholders otherwise elect to receive a fraction of an Open Text TSX traded common share, having a value of CDN $0.65 based on the volume weighted average trading price of Open Text TSX traded common shares in the 10 trading day period immediately preceding the closing date of the acquisition. This purchase price represents a premium of approximately 100 percent above the 30 trading day average closing price of Nstein’s common shares. The transaction is valued at approximately CDN $35 million. Based in Montreal, Nstein’s solutions are sold across market segments such as media and information services, life sciences and government. The transaction is expected to close in the second calendar quarter and is subject to customary closing conditions, including approval of two-thirds of the votes cast by Nstein’s shareholders and applicable regulatory and stock exchange approvals. A special meeting of Nstein’s shareholders is expected to be held to consider the amalgamation in early April, 2010. http://www.opentext.com, http://www.nstein.com

Google Search Appliance Gets New Connectors

Google has announced an upgraded suite of GSA Connectors for the Google Search Appliance (GSA)- including connectors to integrate offline company data with information stored in the cloud. GSA Connectors connect the GSA with content management systems and other repositories, so that users can find the information they are looking for in unified search results. With the upgraded Google Search Appliance Connectors, the connector framework is simplified so that it can search content stored across various databases. One of the featured GSA Connectors is for Salesforce, enabling the GSA can search content in Salesforce, providing sales, marketing, and customer support personnel access to the information they seek regularly. In addition, new updates and features have been made to the connectors for SharePoint, Livelink, FileNet, and Documentum. Specifically, the SharePoint connector supports batch authorization and multiple site collection, and has added 64-bit Windows support. Additionally, the Google Search Box can be implemented within SharePoint, which would be powered by the GSA, giving results from databases outside of the SharePoint system. Multiple connectors now support more recent versions of content systems, such as the Documentum v6.5, or the FileNet v4. www.google.com/enterprise/search/gsa.html

MuseGlobal and Specialty Systems Partner

MuseGlobal announced a partnership with Specialty Systems, Inc., a company focusing on innovative information systems solutions to Federal, State and Local Government customers. Specialty Systems, Inc. is partnering with MuseGlobal to provide the systems integration expertise to engineer law enforcement and homeland security applications built on MuseGlobal’s MuseConnect, which provides federated search and harvesting technologies, with a library of more than 6,000 pre-built source connectors. The applications resulting from this partnership will incorporate unified information access allowing structured data from database sources; semi-structured data from spreadsheets, forms and XML sources; unstructured data from web sites, documents, email; and rich media such as images, video and audio information to be accessed simultaneously from internal databases and external sources.  This information is gathered on the fly, and unified for immediate presentation to the requestor. http://www.specialtysystems.com, http://www.museglobal.com