Curated for content, computing, and digital experience professionals

Year: 2011 (Page 10 of 12)

Just Published: Outsell Gilbane Study on Multilingual Marketing Content

Our 2011 report describing the current state of practice for globalizing multilingual marketing content is available now through March 31 exclusively through study sponsors  Across Systems, ADAM Software, Lionbridge, and SDL.

Multilingual Marketing Content: Growing International Business With Global Content Value Chains features a major update of the global content value chain, Gilbane’s framework for helping companies plan and manage their globalization practices. The new value chain adds core competencies to the existing functional view of multilingual content processes, and it clearly ties the value chain to business outcomes.

Study data includes top business goals and objectives and the investments that marketing and localization managers are making in programs and initiatives that support those goals. The analysis covers what marketing organizations can learn from product content groups, who are generally further along the content globalization maturity curve.

The report will be available directly from the Gilbane website starting April 1. In the meantime, please visit a sponsor site to access the study, and check this blog for research highlights and insights.

e-Spirit Integrates FirstSpirit CMS Into Liferay Portal

e-Spirit AG has added the open source Liferay Portal to their range of possibilities for integration of the FirstSpirit content management system into enterprise portals. The module was developed in cooperation with e-Spirit’s technology partner USU. Integrating FirstSpirit in Liferay will allow organizations to create employee portals that combine Enterprise 2.0 functionality, IT applications, content and collaboration. Organizations will also be able to provide their employees with access to Web 2.0 functions such as forums, blogs and wikis and offering them a platform to efficiently organize collaboration and share information beyond individual departments. The new module will be available in May. http://www.e-spirit.com http://www.liferay.com/

ETL and Building Intelligence Behind Semantic Search

A recent inquiry about a position requiring ETL (Extraction/Transformation/Loading) experience prompted me to survey the job market in this area. It was quite a surprise to see that there are many technical positions seeking this expertise, plus experience with SQL databases, and XML, mostly in healthcare, finance or with data warehouses. I am also observing an uptick in contract positions for metadata and taxonomy development.

My research on Semantic Software Technologies placed me on a path for reporters and bloggers to seek my thoughts on the Watson-Jeopardy story. Much has been written on the story but I wanted to try a fresh take on the meaning of it all. There is a connection to be made between the ETL field and building a knowledgebase with the smarts of Watson. Inspiration for innovation can be drawn from the Watson technology but there is a caveat; it involves the expenditure of serious mental and computing perspiration.

Besides baked-in intelligence for answering human questions using natural language processing (NLP) to search, an answer-platform like Watson requires tons of data. Also, data must be assembled in conceptually and contextually relevant databases for good answers to occur. When documents and other forms of electronic content are fed to a knowledgebase for semantic retrieval, finely crafted metadata (data describing the content) and excellent vocabulary control add enormous value. These two content enhancers, metadata and controlled vocabularies, can transform good search into excellent search.

The irony of current enterprise search is that information is in such abundance that it overwhelms rather than helps findability. Content and knowledge managers can’t possibly contribute the human resources needed to generate high quality metadata for everything in sight. But there are numerous techniques and technologies to supplement their work by explicitly exploiting the mountain of information.

Good content and knowledge managers know where to find top quality content but may not know that, for all common content formats, there are tools to extract key metadata embedded (but hidden) in it. Some of these tools can also text mine and analyze the content for additional intelligent descriptive data. When content collections are very large but too small to justify (under a million documents) the most sophisticated and complex semantic search engines, ETL tools can relieve pressure on metadata managers by automating a lot of mining, extracting entities and concepts needed for good categorization.

The ETL tool array is large and varied. Platform tools from Microsoft (SSIS) and IBM (DataStage) may be employed to extract, transform and load existing metadata. Other independent products such as those from Pervasive and SEAL may contribute value across a variety of platforms or functional areas from which content can be dramatically enhanced for better tagging and indexing. The call for ETL experts is usually expressed in terms of engineering functions who would be selecting, installing and implementing these products. However, it has to be stressed that subject and content experts are required to work with engineers. The role of the latter is to help tune and validate the extraction and transformation outcomes, making sure terminology fits function.

Entity extraction is one major outcome of text mining to support business analytics, but tools can do a lot more to put intelligence into play for semantic applications. Tools that act as filters and statistical analyzers of text data warehouses will help reveal terminology for use in building specialized controlled vocabularies for use in auto-categorization. A few vendors that are currently on my radar to help enterprises understand and leverage their content landscape include EntropySoft Content ETL, Information Extraction Systems, Intelligenx, ISYS Document Filters, RAMP, and XBS, something here for everyone.

The diversity of emerging applications is a leading indicator that there is a lot of innovation to come with all aspects of ETL. While RAMP is making headway with video, another firm with a local connection is Inforbix. I spoke with a co-founder, Oleg Shilovitsky for my semantic technology research last year before they launched. As he then asserted, it is critical to preserve, mine and leverage the data associated with design and manufacturing operations. This area has huge growth potential and Inforbix is now ready to address that market.

Readers who seek to leverage ETL and text mining will gain know-how from the cases presented at the 2011 Text Analytics Summit, May 18-19 in Boston. As well, the exhibits will feature products to consider for making piles of data a valuable knowledge asset. I’ll be interviewing experts who are speaking and exhibiting at that conference for a future piece. I hope readers will attend and seek me out to talk about your metadata management and text mining challenges. This will feed ideas for future posts.

Finally, I’m not the only one thinking along these lines. You will find other ideas and a nudge to action in these articles.

Boeri, Bob. Improving Findability Behind the Firewall, 28 slides. Enterprise Search Summit 2010, NY, 05/2010.
Farrell, Vickie. The Need for Active Metadata Integration: The Hard Boiled Truth. DM Direct Newsletter, 09/09/2005, 3p
McCreary, Dan. Entity Extraction and the Semantic Web, Semantic Universe, 01/01/2009
White, David. BI or bust? KMWorld, 10/28/2009, 3p.

Enterprise Edition of Adobe Digital Publishing Suite Now Available

Adobe Systems Incorporated announced the immediate availability of the Enterprise Edition of Adobe Digital Publishing Suite, a set of hosted software services and viewer technology to create, distribute, monetize and analyze digital magazines, newspapers and publications. With output aimed at Android tablets, RIM PlayBook, and iOS tablet devices, the Enterprise Edition is designed for large publishers to implement a custom tablet publishing solution without disrupting existing processes and infrastructure. Today’s news follows the announcement that Adobe Digital Publishing Suite will support both Apple App Store Subscriptions and Google One Pass for magazine and newspaper publishers. http://www.adobe.com/

Acquia Launches Drupal Gardens 1.0

Acquia announced the general availability release of Drupal Gardens 1.0 with new capabilities and pricing plans. Drupal Gardens is a way to build content-rich dynamic sites. Views provides Drupal Gardens with a collection of tools for site builders with the simplicity of software-as-a-service (SaaS) delivery. Without writing any code, Views allows creation of custom mashups or combinations of content, media, user profiles, and more. Site builders can point and click to pull together any information on their site and craft lists, grids, tables, reports, RSS feeds, and navigation. Views can also be configured to display different results based on visitor interactions, such as displaying posts submitted over the past month versus the most popular. With Views, Drupal Gardens sites can be assembled and deployed with dynamic content. Importantly, there is no lock-in for site builders and owners with Drupal Gardens’ OpenSaaS approach. If there is a need to add custom modules, simply export the complete site to Acquia Dev Cloud or your own hosting environment. Drupal Gardens is offering a tiered pricing structure ranging from individuals to large enterprises. drupalgardens.com

Google Debuts iOS Translation App

The official Google Translate for iPhone app is now available for download from the App Store. The new app has all of the features of the web app, as well as some new additions designed to improve translation experience. The new app accepts voice input for 15 languages, and—just like the web app—you can translate a word or phrase into one of more than 50 languages. For voice input, just press the microphone icon next to the text box and say what you want to translate. You can also listen to your translations spoken out loud in one of 23 different languages. This feature uses the same new speech synthesizer voices as the desktop version of Google Translate introduced last month. Another feature is the ability to easily enlarge the translated text to full-screen size. This way, it’s easier to read the text on the screen, or show the translation to the person you are communicating with. Just tap on the zoom icon to quickly zoom in. And the app also includes all of the major features of the web app, including the ability to view dictionary results for single words, access your starred translations and translation history even when offline, and support romanized text like Pinyin and Romaji. You can download Google Translate now from the App Store globally. The app is available in all iOS supported languages, but you’ll need an iPhone or iPod touch iOS version 3 or later. http://itunes.apple.com/us/app/google-translate/

iCore CMS Released

2011 marks the launch of iCore CMS, a new web content management system designed for managing the entire workings of an online business. The iCore CMS is the brain child of Instani, a Microsoft Certified Partner delivering web design, SEO and mobile application development services to a global client base. iCore CMS allows businesses to manage all aspects of product management and customer relations through one user interface. Users can choose from a variety of free customer facing template designs with the option of custom design and development by the Instani team. All system updates are instantaneous and free for users, and affordable monthly payment plans allow businesses to choose a package best tailored to their requirements. iCore is a fully hosted CMS, particularly ideal for web designers who require something customisable and plug and play for their clients. iCore Content Management System is fully rebrandable and compatible with Dreamweaver software. iCore challenges the current capabilities of open source CMS by providing an unrestricted, highly secure and fully supported platform. www.icorecms.com

How Far Does Semantic Software Really Go?

A discussion that began with a graduate scholar at George Washington University in November, 2010 about semantic software technologies prompted him to follow up with some questions for clarification from me. With his permission, I am sharing three questions from Evan Faber and the gist of my comments to him. At the heart of the conversation we all need to keep having is, how far does this technology go and does it really bring us any gains in retrieving information?

1. Have AI or semantic software demonstrated any capability to ask new and interesting questions about the relationships among information that they process?

In several recent presentations and the Gilbane Group study on Semantic Software Technologies, I share a simple diagram of the nominal setup for the relationship of content to search and the semantic core, namely a set of terminology rules or terminology with relationships. Semantic search operates best when it focuses on a topical domain of knowledge. The language that defines that domain may range from simple to complex, broad or narrow, deep or shallow. The language may be applied to the task of semantic search from a taxonomy (usually shallow and simple), a set of language rules (numbering thousands to millions) or from an ontology of concepts to a semantic net with millions of terms and relationships among concepts.

The question Evan asks is a good one with a simple answer, “Not without configuration.” The configuration needs human work in two regions:

  • Management of the linguistic rules or ontology
  • Design of search engine indexing and retrieval mechanisms

When a semantic search engine indexes content for natural language retrieval, it looks to the rules or semantic nets to find concepts that match those in the content. When it finds concepts in the content with no equivalent language in the semantic net, it must find a way to understand where the concepts belong in the ontological framework. This discovery process for clarification, disambiguation, contextual relevance, perspective, meaning or tone is best accompanied with an interface making it easy for a human curator or editor to update or expand the ontology. A subject matter expert is required for specialized topics. Through a process of automated indexing that both categorizes and exposes problem areas, the semantic engine becomes a search engine and a questioning engine.

The entire process is highly iterative. In a sense, the software is asking the questions: “What is this?”, “How does it relate to the things we already know about?”, “How is the language being used in this context?” and so on.

2. In other words, once they [the software] have established relationships among data, can they use that finding to proceed – without human intervention- to seek new relationships?

Yes, in the manner described for the previous question. It is important to recognize that the original set of rules, ontologies, or semantic nets that are being applied were crafted by human beings with subject matter expertise. It is unrealistic to think that any team of experts would be able to know or anticipate every use of the human language to codify it in advance for total accuracy. The term AI is, for this reason, a misnomer because the algorithms are not thinking; they are only looking up “known-knowns” and applying them. The art of the software is in recognizing when something cannot be discerned or clearly understood; then the concept (in context) is presented for the expert to “teach” the software what to do with the information.

State-of-the-art software will have a back-end process for enabling implementer/administrators to use the results of search (direct commentary from users or indirectly by analyzing search logs) to discover where language has been misunderstood as evidenced by invalid results. Over time, more passes to update linguistic definitions, grammar rules, and concept relationships will continue to refine and improve the accuracy and comprehensiveness of search results.

3. It occurs to me that the key value added of semantic technologies to decision-making is their capacity to link sources by context and meaning, which increases situational awareness and decision space. But can they probe further on their own?

Good point on the value and in a sense, yes, they can. Through extensive algorithmic operations, instructions can be embedded (and probably are for high-value situations like intelligence work), instructing the software what to do with newly discovered concepts. Instructions might then place these new discoveries into categories of relevance, importance, or associations. It would not be unreasonable to then pass documents with confounding information off to other semantic tools for further examination. Again, without human analysis along the continuum and at the end point, no certainty about the validity of the software’s decision-making can be asserted.

I can hypothesize a case in which a corpus of content contains random documents in foreign languages. From my research, I know that some of the semantic packages have semantic nets in multiple languages. If the corpus contains material in English, French, German and Arabic, these materials might be sorted and routed off to four different software applications. Each batch would be subject to further linguistic analysis, followed by indexing with some middleware applied to the returned results for normalization, and final consolidation into a unified index. Does this exist in the real world now? Probably there are variants but it would take more research to find the cases, and they may be subject to restrictions that would require the correct clearances.

Discussions with experts who have actually employed enterprise specific semantic software, underscores the need for subject expertise, and some computational linguistics training coupled with an aptitude for creative inquiry. These scientists informed me that individuals, who are highly multi-disciplinary and facile with electronic games and tools, did the best job of interacting with the software and getting excellent results. Tuning and configuration over time by the right human players is still a fundamental requirement.

« Older posts Newer posts »

© 2024 The Gilbane Advisor

Theme by Anders NorenUp ↑