Curated for content, computing, data, information, and digital experience professionals

Category: Computing & data (Page 90 of 92)

Computing and data is a broad category. Our coverage of computing is largely limited to software, and we are mostly focused on unstructured data, semi-structured data, or mixed data that includes structured data.

Topics include computing platforms, analytics, data science, data modeling, database technologies, machine learning / AI, Internet of Things (IoT), blockchain, augmented reality, bots, programming languages, natural language processing applications such as machine translation, and knowledge graphs.

Related categories: Semantic technologies, Web technologies & information standards, and Internet and platforms.

Reflections on Gov 2.0 Expo and Summit

O’Reilly’s Gov 2.0 events took place last week. We’ve had some time to think about what the current wave of activity means to buyers and adopters of content technologies.

Both the Expo and Summit programs delivered a deluge of examples of exciting new approaches to connecting consumers of government services with the agencies and organizations that provide them.

  • At the Expo on Sept 8,  25 speakers from organizations like NASA, TSA, US EPA, City of Santa Cruz,  Utah Department of Public Safety, and the US Coast Guard provided five-minute overviews of their 2.0 applications in a sometimes dizzying fast-paced format.
  • Sunlight Labs sponsored an Apps for America challenge that featured finalists who combined federal content available on Data.gov and open source software in some intriguing applications, including DataMasher, which enables you to mash up sources such as stats on numbers of high school graduates and guns per household.
  • The Summit on Sept 9 and 10 featured more applications plus star-status speakers including Aneesh Chopra, the US’s first CTO operating under the Federal Office of Science and Technology Policy; Vinton Cerf, currently VP and evangelist at Google; and Mitch Kapor.

A primary program theme was “government as platform,” with speakers suggesting and debating just what that means. There was much thoughtful discussion, if not consensus. Rather than report, interested readers can search Twitter hash tags #gov20e and #gov20s for comments.

From the first speaker on, we were immediately struck by the rapid pace of change in government action and attitude about content and data sharing. Our baseline for comparison is Gilbane’s last conference on content applications within government and non-profit agencies in June 2007. In presentations and casual conversations with attendees, it was clear that most organizations were operating as silos. There was little sharing or collaboration within and among organizations. Many attendees expressed frustration that this was so. When we asked what could be done to fix the problem, we distinctly remember one person saying that connecting with other content managers just within her own agency would be a huge improvement.

Fast forward a little over two years to last week’s Gov2.0 events. Progress towards internal collaboration, inter-agency data sharing, and two-way interaction between government and citizens is truly remarkable. At least three factors have created a pefect storm of conditions: the current administration’s vision and mandate for open government, broad acceptance of social interaction tools at the personal and organizational level, and technology readiness in the form of open source software that makes it possible to experiment at low cost and risk.

Viewing the events through Gilbane’s content-centric lens, we offer three takeaways:

  • Chopra indicated that the formal Open Government directives to agencies, to be released in several weeks, will include the development of “structured schedules” for making agency data available in machine-readable format. As Tim O’Reilly said while interviewing Chopra, posting “a bunch of PDFs” will not be sufficient for alignment with the directives. As a result, agencies will be accelerating the adoption of XML and the transformation of publishing practices to manage structured content. As a large buyer of content technologies and services, government agencies are market influencers. We will be watching carefully for the impact of Open Government initiatives on the broader landscape for content technologies.
  • There was little mention of the role of content management as a business practice or technology infrastructure. This is not surprising, given that Gov2.0 wasn’t about content management. And while the programs comprised lots of show-and-tell examples, most were very heavy on show and very light on tell. But it does raise a question about how these applications will be managed, governed, and made sustainable and scalable. Add in the point above — that structured content will now be poised for wider adoption, creating demand for XML-aware content management solutions. Look for more discussion as agencies begin to acknowledge their content management challenges.
  • We didn’t hear a single mention of language issues in the sessions we attended. Leaving us to wonder if non-native English speakers who are eligible for government services will be disenfranchised in the move to Open Government.

All in all, thought-provoking, well-executed events. For details, videos of the sessions are available on the Gov2.0 site.

EMC Expands Developer Resources for Enterprise Content Management

EMC Corporation (NYSE:EMC) announced free, full-function developer editions of its enterprise content management (ECM) products and launched two new online communities dedicated to EMC Documentum and XML developers. Through the Documentum and XML communities, developers will have open access to resources that includes code samples, tutorials, full product documentation and “getting started” guides. EMC Documentum Content Server Developer Edition provides developers free access and offers a “one-click” deployment that can be run on a laptop so that developers can quickly start creating their Documentum-based solutions. EMC Documentum xDB Developer Edition provides developers a scalable, high performance, native XML database at no cost for development and testing. .NET Productivity Suite makes it easier for developers to conduct integrations with Microsoft applications such as SharePoint by allowing them to work exclusively in a Microsoft environment. EMC Documentum Content Services for Salesforce CRM allows developers to embed Documentum content services within Salesforce CRM. All products are available today. The free developer editions of Content Server and xDB are for development, testing and trial only. Standard licensing fees apply for production and run-time deployments. http://www.emc.com/

Oracle to Buy Sun

Sun Microsystems (NASDAQ: JAVA) and Oracle Corporation (NASDAQ: ORCL) announced they have entered into a definitive agreement under which Oracle will acquire Sun common stock for $9.50 per share in cash. The transaction is valued at approximately $7.4 billion, or $5.6 billion net of Sun’s cash and debt. The Board of Directors of Sun Microsystems has unanimously approved the transaction. It is anticipated to close this summer, subject to Sun stockholder approval, certain regulatory approvals and customary closing conditions. “We expect this acquisition to be accretive to Oracle’s earnings by at least 15 cents on a non-GAAP basis in the first full year after closing. We estimate that the acquired business will contribute over $1.5 billion to Oracle’s non-GAAP operating profit in the first year, increasing to over $2 billion in the second year. This would make the Sun acquisition more profitable in per share contribution in the first year than we had planned for the acquisitions of BEA, PeopleSoft and Siebel combined,” said Oracle President Safra Catz. The Sun Solaris operating system is the leading platform for the Oracle database, Oracle’s largest business, and has been for a long time. With the acquisition of Sun, Oracle can optimize the Oracle database for some of the features of Solaris. http://www.oracle.com, http://www.sun.com

Adobe Announces LiveCycle Developer Express via Amazon Web Services

Adobe Systems Incorporated (Nasdaq:ADBE) announced the immediate availability of Adobe LiveCycle ES Developer Express software, a full version of Adobe LiveCycle ES hosted in the Amazon Web Services cloud computing environment. Using the Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3) technologies, Adobe’s offering provides a virtual, self-contained development environment where enterprise developers can prototype, develop, and test Adobe LiveCycle ES applications without needing to install and configure Adobe LiveCycle ES themselves. With Adobe LiveCycle ES Developer Express, Adobe LiveCycle ES applications are pre-configured as ready to run server instances on the Amazon EC2 server. This can help reduce the time required to boot new server instances to minutes, allowing enterprise developers to quickly begin testing and modifying applications. Developers can effectively bullet-proof their applications without having to invest in a development environment or test lab. Old projects may be deleted or saved for future access and new projects can begin without any cleanup required from the last install. Adobe LiveCycle ES Developer Express is immediately available to all members of the Adobe Enterprise Developer Program. http://aws.amazon.com/ec2/, http://www.adobe.com/products/livecycle

DataDirect Announces New Release of XML Data Integration Suite

DataDirect Technologies, an operating company of Progress Software Corporation (NASDAQ- PRGS), announced the latest release of the DataDirect Data Integration Suite featuring new versions of its XML-based component technologies for data integration in traditional and service-oriented environments. Designed to meet the data transformation and aggregation needs of developers, the DataDirect Data Integration Suite contains the latest product releases of DataDirect XQuery, DataDirect XML Converters (Java and .NET) and Stylus Studio in one installation. DataDirect XQuery is an XQuery processor that enables developers to access and query XML, relational data, Web services, EDI, legacy, or a combination of data sources. New to version 4.0 is full support for the XQuery Update Facility (XUF), an extension of the XQuery language that allows making changes to data manipulated inside the XQuery. Now developers can more easily update individual XML documents, XML streams, and file collections from within their XQuery applications. The product also includes the ability to update and create Zip files, therefore supporting the OpenOffice XML format. The latest release of the DataDirect XML Converters are compatible with Microsoft BizTalk Server 2006 and are integrated in the Microsoft BizTalk development environment. For healthcare organizations needing to comply with the X12 electronic data interchange (EDI) standards and the latest Health Insurance Portability and Accountability Act (HIPAA) 5010 transaction definitions, the DataDirect XML Converters now include support for the HIPAA EDI dialects including 004010A1, 005010 and 005010A1 messages. Stylus Studio 2009 has a new EDI to XML module that works with DataDirect XML Converters in an interactive way. Users can now load EDI documents to view contents, test conversions, create customizations and preview XML. http://www.datadirect.com

eZ Systems Updates eZ Components

eZ Systems announced the release of eZ Components version 2008.2. This is the seventh major version of eZ Components, which is a general-purpose PHP library of over 40 components used independently or together for PHP application development. The latest versions of eZ Publish are also based on eZ Components. With eZ Components, developers can concentrate on solving customer-specific needs. The eZ Components tool set provides key application functionality, such as caching, authentication, database interaction, templates, graphs, and much more. Main improvements in this release include more features for the Document and Webdav components. The Document component, which enables you to convert documents between different formats, was already able to convert ReST to XTHML and DocBook. In this release, more formats are implemented, such as three different wiki formats (Confluence, Creole and DokuWiki), the eZ Publish XML formats, as well as reading XHTML and writing ReST. The wiki parser can easily be extended for other wiki formats. The Webdav component now supports authentication and authorization, as well as support for integrating authentication mechanisms into existing systems. In addition, it supports shared and exclusive write locks, even with custom storage back-ends. The main new development of the eZ Components 2008.2 release is the MvcTools component. The MvcTools component implements the tools for a framework. Instead of dedicating the structure of the application, it provides a dispatcher, two request parsers (one for HTTP and one for email messages through the existing Mail component), two routing methods, two view handlers (one through plain PHP scripts and one through the Template component), and a response writer for HTTP. http://ezcomponents.org

Dewey Decimal Classification, Categorization, and NLP

I am surprised how often various content organizing mechanisms on the Web are compared to the Dewey Decimal System. As a former librarian, I am disheartened to be reminded how often students were lectured on the Dewey Decimal system, apparently to the exclusion of learning about subject categorization schemes. They complemented each other but that seems to be a secret among all but librarians.

I’ll try to share a clearer view of the model and explain why new systems of organizing content in enterprise search are quite different than the decimal model.

Classification is a good generic term for defining physical organizing systems. Unique animals and plants are distinguished by a single classification in the biological naming system. So too are books in a library. There are two principal classification systems for arranging books on the shelf in Western libraries: Dewey Decimal and Library of Congress (LC). They each use coding (numeric for Dewey decimal and alpha-numeric for Library of Congress) to establish where a book belongs logically on a shelf, relative to other books in the collection, according to the book’s most prominent content topic. A book on nutrition for better health might be given a classification number for some aspect of nutrition or one for a health topic, but a human being has to make a judgment which topic the book is most “about” because the book can only live in one section of the collection. It is probably worth mentioning that the Dewey and LC systems are both hierarchical but with different priorities. (e.g. Dewey puts broad topics like Religion and Philosophy and Psychology at top levels and LC puts those two topics together while including more scientific and technical topics at the top of the list, like Agriculture and Military Science.)

So why classify books to reside in topic order? It requires a lot of labor to move the collections around to make space for new books. It is for the benefit of the users, to enable “browsing” through the collection, although it may be hard to accept that the term browsing was a staple of library science decades before the internet. Library leaders established eons ago the need for a system of physical organization to help readers peruse the book collection by topic, leading from the general to the specific.

You might ask what kind of help that was for finding the book on nutrition that was classified under “health science.” This is where another system, largely hidden from the public or often made annoyingly inaccessible, comes in. It is a system of categorization in which any content, book or otherwise, can be assigned an unlimited number of categories. Wondering through the stacks, one would never suspect this secret way of finding a nugget in a book about your favorite hobby if that book was classified to live elsewhere. The standard lists of terms for further describing books by multiple headings are called “subject headings” and you had to use a library catalog to find them. Unfortunately, they contain mysterious conventions called “sub-divisions,” designed to pre-coordinate any topic with other generic topics (e.g. Handbooks, etc. and United States). Today we would call these generic subdivision terms, facets. One reflects a kind of book and the other reveals a geographical scope covered by the book.

With the marvel of the Web page, hyperlinking, and “clicking through” hierarchical lists of topics we can click a mouse to narrow a search for handbooks on nutrition in the United States for better health beginning at any facet or topic and still come up with the book that meets all four criteria. We no longer have to be constrained by the Dewey model of browsing the physical location of our favorite topics, probably missing a lot of good stuff. But then we never did. The subject card catalog gave us a tool for finding more than we would by classification code alone. But even that was a lot more tedious than navigating easily through a hierarchy of subject headings, narrowing the results by facets on a browser tab and further narrowing the results by yet another topical term until we find just the right piece of content.

Taking the next leap we have natural language processing (NLP) that will answer the question, “Where do I find handbooks on nutrition in the United States for better health?” And that is the Holy Grail for search technology – and a long way from Mr. Dewey’s idea for browsing the collection.

Sun Announces Agreement to Acquire MySQL

Sun Microsystems, Inc. (NASDAQ: JAVA) announced it has entered into a definitive agreement to acquire MySQL AB, an open source icon and developer of open source databases for approximately $1 billion in total consideration. The acquisition accelerates Sun’s position in enterprise IT to now include the $15 billion database market. With millions of global deployments including Facebook, Google, Nokia, Baidu and China Mobile, MySQL will bring synergies to Sun that will help drive new adoption of MySQL’s open source database in more traditional applications and enterprises. The integration with Sun will extend the commercial appeal of MySQL’s offerings and improve its value proposition with the addition of Sun’s global services organization. MySQL will also gain new distribution through Sun’s channels including its OEM relationships with Intel, IBM and Dell. MySQL’s open source database is the “M” in LAMP – the software platform comprised of Linux, Apache, MySQL and PHP/Perl. Sun is committed to enhancing and optimizing the LAMP stack on GNU/Linux and Microsoft Windows along with OpenSolaris and MAC OS X. The database from MySQL, OpenSolaris and GlassFish, together with Sun’s Java platform and NetBeans communities, will create a Web application platform across a wide range of customers shifting their applications to the Web. Following completion of the proposed transaction, MySQL will be integrated into Sun’s Software, Sales and Service organizations and the company’s CEO, Marten Mickos, will be joining Sun’s senior executive leadership team. In the interim, a joint team with representatives from both companies will develop integration plans that build upon the technical, product and cultural synergies and the best business and product development practices of both companies. MySQL is headquartered in Cupertino, CA and Uppsala, Sweden and has 400 employees in 25 countries. As part of the transaction, Sun will pay approximately $800 million in cash in exchange for all MySQL stock and assume approximately $200 million in options. The transaction is expected to close in late Q3 or early Q4 of Sun’s fiscal 2008. Completion of the transaction is subject to regulatory approval and other customary closing conditions. The deal is expected to be accretive to FY10 operating income on a GAAP basis. http://www.mysql.com, http://sun.com

« Older posts Newer posts »

© 2025 The Gilbane Advisor

Theme by Anders NorenUp ↑