Curated for content, computing, and digital experience professionals

Topic: Technology (Page 1 of 5)

The word technology refers to the making, modification, usage, and knowledge of tools, machines, techniques, crafts, systems, and methods of organization, in order to solve a problem, improve a preexisting solution to a problem, achieve a goal, handle an applied input/output relation or perform a specific function. It can also refer to the collection of such tools, including machinery, modifications, arrangements and procedures.

Content Accessibility in the Enterprise is Really Search

The Gilbane Conference call for speakers is out and submissions are due in three days, May 2. As one who has been writing about enterprise search topics for over ten years, and engaged in search technology development since 1974, I know it is still a relevant topic.

If you are engaged in any role, in any type of content repository development or use, you know that excellent accessibility is fundamental to making content valuable and useable. You are also probably involved in influencing or trying to influence decisions that will make certain that technology implementations have adequate staffing for content metadata and controlled vocabulary development.

Please take a look at this conference track outline and consider where your involvement can be inserted. Then submit a speaking proposal to share your direct experiences with search or a related topic. Our conference participants love to hear real stories of enterprise initiatives that illustrate: innovative approaches, practical solutions, workarounds to technical and business problems, and just plain scrappy projects that bring value to a group or to the whole enterprise.  In other words, how do you get the job done within the constraints you have faced?

Track E: Content, Collaboration and Employee Engagement

Designed for content, information, technical, and business managers focused on enterprise social, collaboration, intranet, portal, knowledge, and back-end content applications.

  • Collaboration and the social enterprise
  • Collaboration tools & social platforms
  • Enterprise social metrics
  • Community building & knowledge sharing
  • Content management & intranet strategies
  • Enterprise mobile strategies
  • Content and information integration
  • Enterprise search and information access
  • Semantic technologies
  • Taxonomies, metadata, tagging

Please consider participating in the conference and especially if content findability and accessibility are high on your list of “must have” content solutions. Submit your proposal here. The need for good findability of content has never been higher and your experiences must be heard by vendors, IT managers and content experts together in this forum.

Audio and Video: Metadata, Meta-tags, Meta-Understanding

2014 opened with a post about findability; that theme continues with some thoughts on what it means to build an enterprise digital asset management (DAM) repository and have it actually deliver findability for audio and video content.

Working on a project for an institution, which has experience using a DAM for image assets, I have become keenly aware of the heavy lift required for a similar system for audio and video files. Heavy lift means human resources with the knowledge to metatag assets manually. This metadata activity is necessary to establish text that describes the “aboutness” of an audio or video file. Search queries in a DAM are done using text and without text, finding an asset will be impossible.

Stephen Arnold raises many critical issues in the article Video Metadata: ripe for innovation in KMWorld, March, 2014. It articulates very well the challenge of indexing a video asset because, if any metadata exists, much of it will be transactional in nature, not descriptive of the content.

In any institution with limited content editorial or curatorial resources to catalog internally produced audio and video files, it will fall to the creator to describe with metadata what the essential elements are. Those elements might include, who is in the video, who is performing in an audio file, where did a performance take place, what are the major themes, what instruments were being featured, and so on. Software applications exist that can extract words from spoken or sung language, or from text images that are visible in a video. But when it comes to the “aboutness” or major themes being demonstrated, only the creator or creator’s surrogate will have the required meta-understanding.

Enterprises with a public face or commercial aspect will employ a metadata creation staff for images or files that go into their public web sites. However, justifying staffing to make audio and video a valuable asset for internal sharing and consumption is a tough sell. Commitment to building up an internal DAM that will be used and useful because its assets are easily found takes faith and almost religious fervor on the part of participating contributors. Technology can only go so far in making those assets findable. Attribute possibilities are voluminous and not easily codified.

On the arts frontier of searchability, one only has to look at the gold standard for controlled vocabulary, the Getty Museum, to see the breadth and depth of categories of their thesauri covering paintings, sculpture, drawing, crafts, woodworking, etc. In non-classical musicology no such universal standard of terminologies exists for public consumption. In the musical arts getting musicians to agree on how to label innovative and evolving genres will be a herculean human effort.

Building a DAM for internal audio and videos files is not to be undertaken without answering these questions:

  • How will findability be defined?
  • What is the audience?
  • Who is going to create the metadata for uploaded assets, particularly “locally created” content?
  • What are the technology tools that can index and search the assets?
  • What are the resources for establishing, modifying, and perpetually expanding taxonomies/controlled terminology?
  • When assets are found, how will they be displayed or played?
  • What is the on-going process for sustaining the repository, curating and expanding its scope?

This is a new frontier in content management. With so much investment in audio and video engineering, and the entertainment industry, it is time to innovate on the “findability” front, as well. In the meantime, a wise process for adoption is to start small and simple with the metadata development effort. Then hope that technology innovation will emerge to help your process before you retire.

What Experts Say about Enterprise Search: Content, Interface Design and User Needs

This recap might have the ring of an old news story but these clips are worth repeating until more enterprises get serious about making search work for them, instead of allowing search to become an expensive venture in frustration. Enterprise Search Europe, May 14-16, 2013, was a small meeting with a large punch. My only regret is that the audience did not include enough business and content managers. I can only imagine that the predominant audience members, IT folks, are frustrated that the people whose support they need for search to succeed were not in attendance to hear the messages.

Here are just a few of the key points that business managers and those who “own” search budgets need to hear.

On Day 1 I attended a workshop presented by Tony Russell-Rose [Managing Director, UXLabs and co-author of Designing the Search Experience, also at City University London], Search Interface Design. While many experts talk about the two top priorities for search success, recall (all relevant results returned) and precision (all results returned are relevant), they usually fail to acknowledge a hard truth. We all want “the whole truth and nothing but the truth,” but as Tony pointed out, we can’t have both. He went on to offer this general guidance on the subject; recall in highly regulated or risk intensive business is most important but in e-commerce we tend to favor precision. I would add that in enterprises that have to manage risk and sell products, there is a place for two types of search where priorities vary depending on the business purpose. My takeaway: universal, all-in-one search implementations across an enterprise will leave most users disappointed. It’s time to acknowledge the need for different types of implementations, depending on need and audience.

Ed Dale [Digital Platforms Product Manager, Ernst & Young (USA)] gave a highly pragmatic keynote at the meeting opening, The Six Drivers for Search Quality. The overarching theme was that search rests on content. He went on to describe the Ernst & Young drivers: the right content, optimized for search, constant tuning for optimal results, attention to a user interface that is effective for a user-type, attention to user needs, consistency in function and design. Ed closed with this guidance: develop your own business drivers based on issues that are important to users. Based on these and the company’s drivers, focus your efforts, remembering that you are not your users.

The Language of Discovery: A Toolkit for Designing Big Data Interfaces and Interactions was presented by Joseph Lamantia, [UX Lead: Discovery Products and Services, Oracle Endeca]. He shared the idea that discovery is the ability to understand data, and the importance of not treating data, by itself, as having value without achieving discovery. Discovery was defined as something you have seen, found, and made sense of in order to derive insight. It is achieved by grasping or understanding meaning and significance. What I found most interesting was the discussion of modes of searching that have grown out of a number of research efforts. Begin with slide 44, “Mediated Sense making” to learn the precursors that lead into his “modes” description. When considering search for the needy user, this discussion is especially important. We all discover and learn in different ways and the “mode” topic highlights the multitude of options to contemplate. [NOTE: Don’t overlook Joe’s commentary that accompanies the slides at the bottom of the SlideShare.]

Joe was followed by Tyler Tate, [Cofounder, TwigKit] on Information Wayfinding: A New Era of Discovery. He asked the audience to consider this question, “Are you facilitating the end-user throughout all stages of the information seeking process?” The stages are: initiation > selection > exploration > formulation > collection > action. This is a key point for those most involved in user interface design and content managers thinking about facet vocabulary and sorting results.

Steve Arnold [Arnold IT], always brings a “call to reality” aspect to his presentations and Big Data vs. Search was no different. On “Big Data” a couple of key points stick out, “More Data” is not just more data; it is different. As soon as we begin trying to “manage” it we have to apply methods and technologies to reduce it to dimensions that search systems can deal with. Search data processing has changed very little for the last 50 years and processing constraints limit indexing capabilities across these super large sets. There are great opportunities for creating management tools (e.g. analytics) for big data in order to optimize search algorithms, and make the systems more affordable and usable. Among Arnold’s observations was the incessant push to eliminate humans, getting away from techniques and methods [to enhance content] that work and replacing them with technology. He noted that all the camera and surveillance systems in Boston did not work to stop the Marathon bombers but people in the situation did limit casualties through quick medical intervention and providing descriptions of suspicious people who turned out to be the principal suspects. People must still be closely involved for search to succeed, regardless of the technology.

SharePoint lurks in every session at information technology conferences and this meeting was no exception. Although I was not in the room to hear the presentation, I found these slides from Agnes Molnar [International SharePoint Consultant, ECM & Search Expert, MVP] Search Based Applications with SharePoint 2013 to be among the most direct and succinct explanation of when SharePoint makes sense. It nicely explains where SharePoint fits in the enterprise search eco-landscape. Thanks to Agnes for the clarity of her presentation.

A rapid fire panel on “Trends and Opportunities” moderated by Allen Peltz-Sharpe [Research Director for Content Management & Collaboration, 451 Research] included Charlie Hull [Founder of Flax], Dan Lee of Artirix, Kristian Norling of Findwise (see Findwise survey results), Eric Pugh of OpenSource Connections and Rene Kreigler an independent search consultant. Among the key points offered by the panelists were:

  • There is a lot to accomplish to make enterprise search work after installing the search engine. When it comes to implementation and tuning there are often significant gaps in products and available tools to make search work well with other technologies.
  • Search can be leveraged to find signals of what is needed to improve the search experience.
  • Search as an enterprise application is “not sexy” and does not inspire business managers to support it enthusiastically. Its potential value and sustainability is not well understood, so managers do not view it as something that will increase their own importance.
  • Open source adoption is growing but does face challenges. VC backed companies in that arena will have a struggle to generate enough revenue to make VCs happy. The committer community is dominated by a single firm and that may weaken the staying power of other search (Lucene, Solr) open source committers.

A presentation late in the program by Kara Pernice, Managing Director of NN/g, Nielsen Norman Group, positioned the design of an intranet as a key element in making search compelling. Her insights reflect two decades of “Eyetracking Web Usability” done with Jakob Nielsen, and how that research applies for an intranet. Intranet Search Usability was the theme and Kara’s observations were keenly relevant to the audience.

Not the least of my three days at the meeting were side discussions with Valentin Richter CEO of Raytion, Iain Fletcher of Search Technologies, Martin Rugfelt of Expertmaker, Benoit Leclerc of Coveo, and Steve Andrews an advisor to Q-Sensei. These contributed many ideas on the state of enterprise search. I left the meeting with the overarching sense that enterprise leadership needs to be sold on the benefits for sustaining a search team as part of the information ecosystem. Bringing an understanding of search as not just being a technological, plug & play product and a “one-off” project is the challenge. Messaging is not getting through effectively. We need strong and clear business voices to make the case; the signals are too diffuse and that makes them weak. My take is that messages from search vendors all have valid points-of-view but when they are combined with too many other topics (e.g. “big data,” “analytics,” “open source,” SharePoint, “cloud computing”) basic concepts of what search is and where it belongs in the enterprise gets lost.

Leveraging Search in Small Enterprises

A mantra for a small firm or start-up in the 1970s when “Big Blue” was the standard for top notch sales and selling was we need to out-IBM the IBMers.

Search is just one aspect of being able to find what you need to leverage knowledge assets in your work, whether you are in a small firm, a part of a small group in a large organization or an individual consultant seeking to maximize the masses of content and information surrounding you in work.

My thoughts are inspired by the question asked by Andreas Gruber of Informations und Wissensmanagement in this recent post on Enterprise Search Engine Professionals, LinkedIn group. He posed a request for information stating: For enterprise search solutions for (very) small enterprises (10 to 200 employees), I find it hard to define success factors and it seems, that there are not many examples available. If you follow e.g. the critical success factors from the Martin White’s Enterprise Search book, most of them doesn’t seem to work for a small company – simply because none of them can/will investment in a search team etc.

The upcoming Enterprise Search Europe meeting (May 14-16, 2013) in London is one focus of my attention at present. Since Martin White is the Chairman and principal organizer, Andreas’ comments resonated immediately. Concurrently, I am working on a project for a university department, which probably falls in the category of “small enterprise”. The other relevant project on my desk is a book I am co-authoring on “practical KM” and we certainly aim to appeal to the individual practitioner or groups limited by capital resources. These areas of focus challenge me to respond to Andreas’ comments because I am certain they are top of mind for many and the excellent comments already at the posting show that others have good ideas about the topic, as well.

Intangible capital is particularly significant in many small firms, academia, and for independent consultants, like me. Intensive leveraging of knowledge in the form of expertise, relationships, and processes is imperative in these domains. Intangible capital, as a percent of most businesses currently surpasses tangible capital in value, according to Mary Adams founder of Smarter-Companies. Because intangible capital takes more thought and effort to identify, find or aggregate than hard assets, tools are needed to uncover, discover and pinpoint it.

Let’s take the example of expertise, an indisputable intangible asset of any professional services. For any firm, asking expert staff to put an explicit value on their knowledge, competencies or acumen for tackling the type of problem that you need to have solved may give you a sense of value but you need more. The firm or professional you want to hire must be able to back up its value by providing explicit evidence that they “know their stuff” and can produce. For you, search is a tool to lead you to public or published evidence. For the firm being asked to bid on your work, you want them to be able to produce additional evidence. Top quality firms do put both human and technology search resources to work to service existing projects and clients, and to provide evidence of their qualifications, when being asked to retrieve relevant work or references. Search tools and content management methods are diverse and range from modest to very expensive in scope but no firm can exist for long without technology to support the findability of its intangible capital.

To summarize, there are three principal ways that search pays off in the small-medium business (SMB) sector. Citing a few examples of each they are:

  • Finding expertise (people): potential client engagement principal or team member, answers to questions to fulfill a clients engagement, spurring development or an innovation initiative
  • Retrieving prior work: reuse of know-how in new engagements, discovery of ideas previously tabled, learning, documentation of products and processes, building a proposal, starting point for new work, protecting intellectual property for leverage, when patenting, or participating in mergers and acquisitions.
  • Creating the framework for efficiency: time and speed, reinforcing what you know, supporting PR, communications, knowledge base, portraying the scope of intellectual capital (if you are a target for acquisition), the extent of your partnerships that can expand your ability to deliver, creating new offerings (services) or products.

So, to conclude my comment on Andreas’ posting, I would assert that you can “out-IBM the IBMers” or any other large organization by employing search to leverage your knowledge, people and relationships in smart and efficient ways. Excellent content and search practices can probably reduce your total human overhead because even one or two content and search specialists plus the right technology can deliver significant efficiency in intangible asset utilization.

I hope to see conference attendees who come from that SMB community so we can continue this excellent discussion in London, next month. Ask me about how we “ate our own dog-food” (search tools) when I owned a small software firm in the early 1980s. The overhead was minimal compared to the savings in support headcount.

Embedded Search in the Enterprise

We need to make a distinction between “search in the enterprise” and “enterprise-wide search.” The former is any search that exists persistently in view as we go about our primary work activities. The latter commonly assumes aggregation of all enterprise content via a single platform OR enterprise content to which everyone in the organization will have access. So many attempts at enterprise-wide search are reported to be compromised or frustrated before achieving successful outcomes that it is time to pay attention to point-of-need solutions. This is search that will smoothly satisfy routine retrieval requirements as we work.

Most of us work in a small number of applications all day. A writer will be wedded to a content creation application plus research sources both on the web and internal to the enterprise in which writing is being done. Finding information to support writing whether it is a press release, marketing brochure or technical documentation to accompany a technical product requires access to appropriate content for the writer to deliver to an audience. The audience may be a business analyst, customer’s buyer or product user with advanced technical expertise. During any one work assignment, the writer will usually be focused on one audience and will only need a limited view of content specific to that task.

When a search takes us on a merry chase through multiple resource repositories or in a single repository with heaps of irrelevant content and no good results, we are being forced into a mental traffic nightmare, not of our own making. As this blog post by Tony Schwartz reminds us, we need time to focus and concentrate. It enables us to work smarter and more calmly; for employers seeking to support workers with the best tools, search that works well at the point of doing an assignment is the ultimate perk. I know how frantic and fractionated my mental state becomes as I follow one fruitless web of links after another that I believe will lead me to the piece of information I need. Truthfully, I often become so absorbed in the search and ancillary information I “discover” along the way that sight of the target becomes secondary.

New wisdom from a host of analysts and writers suggests that embedded search is more than a trend, as is search with a specific focus or purposeful business goal. The fact that FAST is now embedded with and for SharePoint and its use is growing principally in that arena illustrates the trend. But readers should also consider a large array of newer search solutions that are strong on semantic features, APIs, integration options, and connectors to a huge variety of content that exists in other application repositories. This article by James Martin in CIO, How to Evaluate Enterprise Search has helpful comments from Leslie Owens of Forrester Research and the rise of connectors is highlighted by Alan Pelz-Sharpe in this post.

Right now two rather new search engines are on my radar screen because of their timely entrance to the marketplace. One is Q-Sensei, which has just released their version 2.0. It is an ontology-based solution very much focused on efficiently processing big data, quick deployment, and integration with content applications. The second is Cambridge Semantics with its Anzo semantic solutions for analyzing and retrieving business data. Finally, I am very excited that ISYS was the object of an acquisition by Lexmark. It was an unexpected move but they deserved to be recognized for having solid connector/filter technology and a large, satisfied customer base. It will be interesting to see how a hardware vendor, noted for print technology, will integrate ISYS search software into its product offerings. Information retrieval belongs where work is being done.

These are just three vendors poised to change the expectations of searchers by fulfilling search needs, embedded or integrated efficiently in select business application areas. Martin White’s most recent enumeration of search vendors puts the list at about 70; they are primarily vendors with standalone search products, products that support standalone search or search engines that complement other content applications. You will see many viable options there that are unfamiliar but be sure to dig down to understand where each might fill a unique need in your enterprise.

When seeking solutions for search problems you need to really understand the purpose before seeking candidate vendors. Then focus on products that have the same clarity of applicability you want. They may be embedded with a product such as Lexmark’s, or a CAD system. The first step is to decide where and for whom you need search to be present.

Classifying Searchers – What Really Counts?

I continue to be impressed by the new ways in which enterprise search companies differentiate and package their software for specialized uses. This is a good thing because it underscores their understanding of different search audiences. Just as important is recognition that search happens in a context, for example:

  • Personal interest (enlightenment or entertainment)
  • Product selection (evaluations by independent analysts vs. direct purchasing information)
  • Work enhancement (finding data or learning a new system, process or product)
  • High-level professional activities (e-discovery to strategic planning)

Vendors understand that there is a limited market for a product or suite of products that will satisfy every budget, search context and the enterprise’s hierarchy of search requirements. Those who are the best focus on the technological strengths of their search tools to deliver products packaged for a niche in which they can excel.

However, for any market niche excellence begins with six basics:

  • Customer relationship cultivation, including good listening
  • Professional customer support and services
  • Ease of system installation, implementation, tuning and administration
  • Out-of-the box integration with complementary technologies that will improve search
  • Simple pricing for licensing and support packages
  • Ease of doing business, contracting and licensing, deliveries and upgrades

While any mature and worthy company will have continually improved on these attributes, there are contextual differentiators that you should seek in your vertical market:

  • Vendor subject matter expertise
  • Vendor industry expertise
  • Vendor knowledge of how professional specialists perform their work functions
  • Vendor understanding of retrieval and content types that contribute the highest value

At a recent client discussion the application of a highly specialized taxonomy was the topic. Their target content will be made available on a public facing web site and also to internal staff. We began by discussing the various categories of terminology already extracted from a pre-existing system.

As we differentiated how internal staff needed to access content for research purposes and how the public is expected to search, patterns emerged for how differently content needs to be packaged for each constituency. For you who have specialized collections to be used by highly diverse audiences, this is no surprise. Before proceeding with decisions about term curation and determining the granularity of their metadata vocabulary, what has become a high priority is how the search mechanisms will work for different audiences.

For this institution, internal users must have pinpoint precision in retrieval on multiple facets of content to get to exactly the right record. They will be coming to search with knowledge of the collection and more certainty about what they can expect to find. They will also want to find their target(s) quickly. On the other hand, the public facing audience needs to be guided in a way that leads them on a path of discovery, navigating through a map of terms that takes them from their “key term” query through related possibilities without demanding arcane Boolean operations or lengthy explanations for advanced searching.

There is a clear lesson here for seeking enterprise search solutions. Systems that favor one audience over another will always be problematic. Therefore, establishing who needs what and how each goes about searching needs to be answered, and then matched to the product that can provide for all target groups.

We are in the season for conferences; there are a few next month that will be featuring various search and content technologies. After many years of walking exhibit halls and formulating strategies for systematic research and avoiding a swamp of technology overload, I try now to have specific questions formulated that will discover the “must have” functions and features for any particular client requirement. If you do the same, describing a search user scenario to each candidate vendor, you can then proceed to ask: Is this a search problem your product will handle? What other technologies (e.g. CMS, vocabulary management) need to be in place to ensure quality search results? Can you demonstrate something similar? What would you estimate the implementation schedule to look like? What integration services are recommended?

These are starting points for a discussion and will enable you to begin to know whether this vendor meets the fundamental criteria laid out earlier in this post. It will also give you a sense of whether the vendor views all searchers and their searches as generic equivalents or knows that different functions and features are needed for special groups.

Look for vendors for enterprise search and search related technologies to interview at the following upcoming meetings:

Enterprise Search Summit, New York, May 10 – 11 […where you will learn strategies and build the skill sets you need to make your organization’s content not only searchable but “findable” and actionable so that it delivers value to the bottom line.] This is the largest seasonal conference dedicated to enterprise search. The sessions are preceded by separate workshops with in-depth tutorials related to search. During the conference, focus on case studies of enterprises similar to yours for better understanding of issues, which you may need to address.

Text Analytics Summit, Boston, May 18 – 19 I spoke with Seth Grimes, who kicks off the meeting with a keynote, asking whether he sees a change in emphasis this year from straight text mining and text analytics. You’ll have to attend to get his full speech but Seth shared that he see a newfound recognition that “Big Data” is coming to grips with text source information as an asset that has special requirements (and value). He also noted that unstructured document complexities can benefit from text analytics to create semantic understanding that improves search, and that text analytics products are rising to challenge for providing dynamic semantic analysis, particularly around massive amounts of social textual content.

Lucene Revolution, San Francisco, May 23 – 24 […hear from … the foremost experts on open source search technology to a broad cross-section of users that have implemented Lucene, Solr, or LucidWorks Enterprise to improve search application performance, scalability, flexibility, and relevance, while lowering their costs.] I attended this new meeting last year when it was in Boston. For any enterprise considering or leaning toward implementing open source search, particularly Lucene or Solr, this meeting will set you on a path for understanding what that journey entails.

ETL and Building Intelligence Behind Semantic Search

A recent inquiry about a position requiring ETL (Extraction/Transformation/Loading) experience prompted me to survey the job market in this area. It was quite a surprise to see that there are many technical positions seeking this expertise, plus experience with SQL databases, and XML, mostly in healthcare, finance or with data warehouses. I am also observing an uptick in contract positions for metadata and taxonomy development.

My research on Semantic Software Technologies placed me on a path for reporters and bloggers to seek my thoughts on the Watson-Jeopardy story. Much has been written on the story but I wanted to try a fresh take on the meaning of it all. There is a connection to be made between the ETL field and building a knowledgebase with the smarts of Watson. Inspiration for innovation can be drawn from the Watson technology but there is a caveat; it involves the expenditure of serious mental and computing perspiration.

Besides baked-in intelligence for answering human questions using natural language processing (NLP) to search, an answer-platform like Watson requires tons of data. Also, data must be assembled in conceptually and contextually relevant databases for good answers to occur. When documents and other forms of electronic content are fed to a knowledgebase for semantic retrieval, finely crafted metadata (data describing the content) and excellent vocabulary control add enormous value. These two content enhancers, metadata and controlled vocabularies, can transform good search into excellent search.

The irony of current enterprise search is that information is in such abundance that it overwhelms rather than helps findability. Content and knowledge managers can’t possibly contribute the human resources needed to generate high quality metadata for everything in sight. But there are numerous techniques and technologies to supplement their work by explicitly exploiting the mountain of information.

Good content and knowledge managers know where to find top quality content but may not know that, for all common content formats, there are tools to extract key metadata embedded (but hidden) in it. Some of these tools can also text mine and analyze the content for additional intelligent descriptive data. When content collections are very large but too small to justify (under a million documents) the most sophisticated and complex semantic search engines, ETL tools can relieve pressure on metadata managers by automating a lot of mining, extracting entities and concepts needed for good categorization.

The ETL tool array is large and varied. Platform tools from Microsoft (SSIS) and IBM (DataStage) may be employed to extract, transform and load existing metadata. Other independent products such as those from Pervasive and SEAL may contribute value across a variety of platforms or functional areas from which content can be dramatically enhanced for better tagging and indexing. The call for ETL experts is usually expressed in terms of engineering functions who would be selecting, installing and implementing these products. However, it has to be stressed that subject and content experts are required to work with engineers. The role of the latter is to help tune and validate the extraction and transformation outcomes, making sure terminology fits function.

Entity extraction is one major outcome of text mining to support business analytics, but tools can do a lot more to put intelligence into play for semantic applications. Tools that act as filters and statistical analyzers of text data warehouses will help reveal terminology for use in building specialized controlled vocabularies for use in auto-categorization. A few vendors that are currently on my radar to help enterprises understand and leverage their content landscape include EntropySoft Content ETL, Information Extraction Systems, Intelligenx, ISYS Document Filters, RAMP, and XBS, something here for everyone.

The diversity of emerging applications is a leading indicator that there is a lot of innovation to come with all aspects of ETL. While RAMP is making headway with video, another firm with a local connection is Inforbix. I spoke with a co-founder, Oleg Shilovitsky for my semantic technology research last year before they launched. As he then asserted, it is critical to preserve, mine and leverage the data associated with design and manufacturing operations. This area has huge growth potential and Inforbix is now ready to address that market.

Readers who seek to leverage ETL and text mining will gain know-how from the cases presented at the 2011 Text Analytics Summit, May 18-19 in Boston. As well, the exhibits will feature products to consider for making piles of data a valuable knowledge asset. I’ll be interviewing experts who are speaking and exhibiting at that conference for a future piece. I hope readers will attend and seek me out to talk about your metadata management and text mining challenges. This will feed ideas for future posts.

Finally, I’m not the only one thinking along these lines. You will find other ideas and a nudge to action in these articles.

Boeri, Bob. Improving Findability Behind the Firewall, 28 slides. Enterprise Search Summit 2010, NY, 05/2010.
Farrell, Vickie. The Need for Active Metadata Integration: The Hard Boiled Truth. DM Direct Newsletter, 09/09/2005, 3p
McCreary, Dan. Entity Extraction and the Semantic Web, Semantic Universe, 01/01/2009
White, David. BI or bust? KMWorld, 10/28/2009, 3p.

Enterprise Trends: Contrarians and Other Wise Forecasters

The gradual upturn from the worst economic conditions in decades is reason for hope. A growing economy coupled with continued adoption of enterprise software, in spite of the tough economic climate, keep me tuned to what is transpiring in this industry. Rather than being cajoled into believing that “search” has become commodity software, which it hasn’t, I want to comment on the wisdom of Jill Dyché and her Anti-predictions for 2011 in a recent Information Management Blog. There are important lessons here for enterprise search professionals, whether you have already implemented or plan to soon.

Taking her points out of order, I offer a bit of commentary on those that have a direct relationship to enterprise search. Based on past experience, Ms. Dyché predicts some negative outcomes but with a clear challenge for readers to prove her wrong. As noted, enterprise search offers some solutions to meet the challenges:

  1. No one will be willing to shine a bright light on the fact that the data on their enterprise data warehouse isn’t integrated. It isn’t just the data warehouse that lacks integration among assets, but among all applications housing critical structured and unstructured content. This does not have to be the case. Several state-of-the-art enterprise search products that are not tied to a specific platform or suite of products do a fine job of federating indexing of disparate content repositories. In a matter of weeks or few months, a search solution can be deployed to crawl, index and search multiple sources of content. Furthermore, newer search applications are being offered for pre-purchase testing for out-of-the-box suitability in pilot or proof-of-concept (POC) projects. Organizations that are serious about integrating content silos have no excuse for not taking advantage of easier to deploy search products.
  2. Even if they are presented with proof of value, management will be reluctant to invest in data governance. Combat this entrenched bias with a strategy to overcome lack of governance; a cost cutting argument is unlikely to change minds. However, risk is an argument that will resonate, particularly when bolstered with examples. Include instances when customers were lost due to poor performance or failure to deliver adequate support services, sales were lost because answers to qualifying questions could not be answered or were not timely, legal or contract issues could not be defended due to inaccessibility of critical supporting documents, or when maintenance revenue was lost due to incomplete, inaccurate or late renewal information getting out to clients. One simple example is the consequences of not sustaining a concordance of customer name, contact, and address changes. The inability of content repositories to talk to each other or aggregate related information in a search because a Customer labeled as Marion University at one address is the same as the Customer labeled University of Marion at another address will be embarrassing in communications and, even worse, costly. Governance of processes like naming conventions and standardized labeling enhances the value and performance of every enterprise system including search.
  3. Executives won’t approve new master data management or business intelligence funding without an ROI analysis. This ties in with the first item because many enterprise search applications include excellent tools for performing business intelligence, analytics, and advanced functions to track and evaluate content resource use. The latter is an excellent way to understand who is searching, for what types of data, and the language used to search. These supporting functions are being built into applications for enterprise search and do not add additional cost to product licenses or implementation. Look for enterprise search applications that are delivered with tools that can be employed on an ad hoc basis by any business manager.
  4. Developers won’t track their time in any meaningful way. This is probably true because many managers are poorly equipped to evaluate what goes into software development. However, in this era of adoption of open source, particularly for enterprise search, organizations that commit to using Lucene or Solr (open source search) must be clear on the cost of building these tools into functioning systems for their specialized purposes. Whether development will be done internally or by a third party, it is essential to place strong boundaries around each project and deployment, with specifications that stage development, milestones and change orders. “Free” open source software is not free or even cost effective when an open meter for “time and materials” exists.
  5. Companies that don’t characteristically invest in IT infrastructure won’t change any time soon. So, the silo-ed projects will beget more silo-ed data…Because the adoption rate for new content management applications is so high, and the ease for deploying them encourages replication like rabbits, it is probably futile to try to staunch their proliferation. This is an important area for governance to be employed, to detect redundancy, perform analytics across silos, and call attention to obvious waste and duplication of content and effort. Newer search applications that can crawl and index a multitude of formats and repositories will easily support efforts to monitor and evaluate what is being discovered in search results. Given a little encouragement to report redundancy and replicated content, every user becomes a governor over waste. Play on the natural inclination for people to complain when they feel overwhelmed by messy search results, by setting up a simple (click a button) reporting mechanism to automatically issue a report or set a flag in a log file when a search reveals a problem.

It is time to stop treating enterprise search like a failed experiment and instead, leverage it to address some long-standing technology elephants roaming around our enterprises.

To follow other search trends for the coming year, you may want to attend a forthcoming webinar, 11 Trends in Enterprise Search for 2011, which I will be moderating on January 25th. These two blogs also have interesting perspectives on what is in store for enterprise applications: CSI Info-Mgmt: Profiling Predictors 2011, by Jim Ericson and The Hottest BPM Trends You Must Embrace In 2011!, by Clay Richardson. Also, some of Ms. Dyché’s commentary aligns nicely with “best practices” offered in this recent beacon, Establishing a Successful Enterprise Search Program: Five Best Practices

« Older posts

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑