Curated for content, computing, data, information, and digital experience professionals

Category: Computing & data (Page 93 of 95)

Computing and data is a broad category. Our coverage of computing is largely limited to software, and we are mostly focused on unstructured data, semi-structured data, or mixed data that includes structured data.

Topics include computing platforms, analytics, data science, data modeling, database technologies, machine learning / AI, Internet of Things (IoT), blockchain, augmented reality, bots, programming languages, natural language processing applications such as machine translation, and knowledge graphs.

Related categories: Semantic technologies, Web technologies & information standards, and Internet and platforms.

Dewey Decimal Classification, Categorization, and NLP

I am surprised how often various content organizing mechanisms on the Web are compared to the Dewey Decimal System. As a former librarian, I am disheartened to be reminded how often students were lectured on the Dewey Decimal system, apparently to the exclusion of learning about subject categorization schemes. They complemented each other but that seems to be a secret among all but librarians.

I’ll try to share a clearer view of the model and explain why new systems of organizing content in enterprise search are quite different than the decimal model.

Classification is a good generic term for defining physical organizing systems. Unique animals and plants are distinguished by a single classification in the biological naming system. So too are books in a library. There are two principal classification systems for arranging books on the shelf in Western libraries: Dewey Decimal and Library of Congress (LC). They each use coding (numeric for Dewey decimal and alpha-numeric for Library of Congress) to establish where a book belongs logically on a shelf, relative to other books in the collection, according to the book’s most prominent content topic. A book on nutrition for better health might be given a classification number for some aspect of nutrition or one for a health topic, but a human being has to make a judgment which topic the book is most “about” because the book can only live in one section of the collection. It is probably worth mentioning that the Dewey and LC systems are both hierarchical but with different priorities. (e.g. Dewey puts broad topics like Religion and Philosophy and Psychology at top levels and LC puts those two topics together while including more scientific and technical topics at the top of the list, like Agriculture and Military Science.)

So why classify books to reside in topic order? It requires a lot of labor to move the collections around to make space for new books. It is for the benefit of the users, to enable “browsing” through the collection, although it may be hard to accept that the term browsing was a staple of library science decades before the internet. Library leaders established eons ago the need for a system of physical organization to help readers peruse the book collection by topic, leading from the general to the specific.

You might ask what kind of help that was for finding the book on nutrition that was classified under “health science.” This is where another system, largely hidden from the public or often made annoyingly inaccessible, comes in. It is a system of categorization in which any content, book or otherwise, can be assigned an unlimited number of categories. Wondering through the stacks, one would never suspect this secret way of finding a nugget in a book about your favorite hobby if that book was classified to live elsewhere. The standard lists of terms for further describing books by multiple headings are called “subject headings” and you had to use a library catalog to find them. Unfortunately, they contain mysterious conventions called “sub-divisions,” designed to pre-coordinate any topic with other generic topics (e.g. Handbooks, etc. and United States). Today we would call these generic subdivision terms, facets. One reflects a kind of book and the other reveals a geographical scope covered by the book.

With the marvel of the Web page, hyperlinking, and “clicking through” hierarchical lists of topics we can click a mouse to narrow a search for handbooks on nutrition in the United States for better health beginning at any facet or topic and still come up with the book that meets all four criteria. We no longer have to be constrained by the Dewey model of browsing the physical location of our favorite topics, probably missing a lot of good stuff. But then we never did. The subject card catalog gave us a tool for finding more than we would by classification code alone. But even that was a lot more tedious than navigating easily through a hierarchy of subject headings, narrowing the results by facets on a browser tab and further narrowing the results by yet another topical term until we find just the right piece of content.

Taking the next leap we have natural language processing (NLP) that will answer the question, “Where do I find handbooks on nutrition in the United States for better health?” And that is the Holy Grail for search technology – and a long way from Mr. Dewey’s idea for browsing the collection.

Sun Announces Agreement to Acquire MySQL

Sun Microsystems, Inc. (NASDAQ: JAVA) announced it has entered into a definitive agreement to acquire MySQL AB, an open source icon and developer of open source databases for approximately $1 billion in total consideration. The acquisition accelerates Sun’s position in enterprise IT to now include the $15 billion database market. With millions of global deployments including Facebook, Google, Nokia, Baidu and China Mobile, MySQL will bring synergies to Sun that will help drive new adoption of MySQL’s open source database in more traditional applications and enterprises. The integration with Sun will extend the commercial appeal of MySQL’s offerings and improve its value proposition with the addition of Sun’s global services organization. MySQL will also gain new distribution through Sun’s channels including its OEM relationships with Intel, IBM and Dell. MySQL’s open source database is the “M” in LAMP – the software platform comprised of Linux, Apache, MySQL and PHP/Perl. Sun is committed to enhancing and optimizing the LAMP stack on GNU/Linux and Microsoft Windows along with OpenSolaris and MAC OS X. The database from MySQL, OpenSolaris and GlassFish, together with Sun’s Java platform and NetBeans communities, will create a Web application platform across a wide range of customers shifting their applications to the Web. Following completion of the proposed transaction, MySQL will be integrated into Sun’s Software, Sales and Service organizations and the company’s CEO, Marten Mickos, will be joining Sun’s senior executive leadership team. In the interim, a joint team with representatives from both companies will develop integration plans that build upon the technical, product and cultural synergies and the best business and product development practices of both companies. MySQL is headquartered in Cupertino, CA and Uppsala, Sweden and has 400 employees in 25 countries. As part of the transaction, Sun will pay approximately $800 million in cash in exchange for all MySQL stock and assume approximately $200 million in options. The transaction is expected to close in late Q3 or early Q4 of Sun’s fiscal 2008. Completion of the transaction is subject to regulatory approval and other customary closing conditions. The deal is expected to be accretive to FY10 operating income on a GAAP basis. http://www.mysql.com, http://sun.com

JustSystems Announces DITA Maturity Model Co-Authored with IBM

JustSystems, Inc. announced the availability of the “DITA Maturity Model,” which was co-authored with IBM and defines a graduated, step-by-step methodology for implementing Darwin Information Typing Architecture (DITA). One of DITA’s features is its support for incremental adoption. Users can start with DITA using a subset of its capabilities, and then add investment over time as their content strategy evolves and expands to cover more requirements and content areas. However, this continuum of adoption has also resulted in confusion, as communities at different stages of adoption claim radically different numbers for cost of migration and return on investment.

The DITA Maturity Model addresses this confusion by dividing DITA adoption into six levels, each with its own required investment and associated return on investment. Users can assess their own capabilities and goals relative to the model and choose the initial adoption level appropriate for their needs and schedule. The six levels of DITA adoption include:

Level 1: Topics – The most minimum DITA adoption requires the migration of the current XML content sources;

Level 2: Scalable Reuse – The major activity at this level is to break down the content in topics that are stored as individual files and use DITA maps to collect and organize the content into reusable units for assembly into specific deliverables;

Level 3: Specialization and Customization – Now, users expand the information architecture to be a full content model, which explicitly defines the different types of content required to meet different author and audience needs and specify how to meet these needs using structured, typed content;

Level 4: Automation and Integration – Once content is specialized, users can leverage their investments in semantics with automation of key processes and begin tying content together even across different specializations or authoring disciplines;

Level 5: Semantic Bandwidth – As DITA diversifies to occupy more roles within an organization, a cross-application, cross-silo solution that shares DITA as a common semantic currency lets groups use the toolset most appropriate for their content authoring and management needs;

Level 6: Universal Semantic Ecosystem – As DITA provides for scalable semantic bandwidth across content silos and applications, a new kind of semantic ecosystem emerges: Semantics that can move with content across old boundaries, wrap unstructured content, and provide validated integration with semi-structured content and managed data sources. http://www.ibm.com, http://na.justsystems.com

Oracle to Acquire BEA Systems

Oracle Corporation (NASDAQ: ORCL) and BEA Systems (NASDAQ: BEAS) announced they have entered into a definitive agreement under which Oracle will acquire all outstanding shares of BEA for $19.375 per share in cash. The offer is valued at approximately $8.5 billion, or $7.2 billion net of BEA’s cash on hand of $1.3 billion. The Board of Directors of BEA Systems has unanimously approved the transaction. It is anticipated to close by mid-2008, subject to BEA stockholder approval, certain regulatory approvals and customary closing conditions. www.oracle.com www.bea.com

One Laptop Per Child Extends Donate/Buy Program

Looking for a unique and meaningful holiday gift?

OLPC has extended its “Give One, Get One” program through the end of the year. A donation of $399 US (a portion of which may be deductible) covers two XO laptops. OLPC will send one device to a child in an OLPC educational zone, and you’ll get one XO device for yourself (or child in your family or local area). Giving options include donating both XOs covered by your contribution.

A recent article in the WSJ points out alternate approaches to addressing OLPC’s mission (“to empower the children of developing countries to learn by providing one connected laptop to every school-age child”). Regardless of who will ultimately provide solutions that take hold, OLPC offers an affordable way to do some good now. Think of it as an opportunity to give new meaning to the term “social computing.” Happy holidays!

New CTO blog

Over the summer we came up with the idea for hosting a blog for CTOs from all parts of the content and information industry to debate technologies and architectures. We finally got around to launching the Content Technology CTO Blog today. Here is the press release, and more info on how it works and how to contribute. John Newton, CTO of Alfresco and Vern Imrich CTO of Percussion already have posts up. Stop by and comment!

What happened to WinFS?

As Tim Bray says “Wow”. Here is the announcement post with a huge number of comments. This is discouraging. As I have argued before, we need the kinds of capabilities WinFS was striving for to make the next leap in managing information. I remain skeptical that database platforms are a sufficient solution for effective object management – they may be the necessary next step, but they are certainly not the ultimate answer.

There are no doubt many easier, shorter-term ways to get return on software development than a radically different operating system, but hopefully at some point there will be sufficient recognition by all the software infrastructure vendors that working together to build a modern OS would be worth it. On the other hand, perhaps what has happened to WinFS is really a sign that the days of huge operating systems are numbered. The problems are really bigger than any one platform. What kind of cross-platform infrastructure is feasible to accomplish the fluid, granular and meaningful interchange of content and behavior we know we need? This is a more interesting question than whether WinFS itself is dead.

UPDATE: There is a lot of commentary out there, but as usual Jon Udell has a view worth reading.

The Attention Economy

Lot’s of talk about ‘attention’ here at ETech. Thinking of attention in terms of economics is fascinating and thought provoking, but I have not quite got the essence of the excitement – just saw Tim Bray who also said he was not sure he got it, and everyone at my lunch table squirmed and then said they didn’t get it either.

The last thing I want is someone managing or making money or even knowing about my attention allocation. I don’t mind some – I am not averse to sharing certain preferences and behavior – but it is mine to share or not, and mine to monetize or not. As a consumer, what is the return? I get more personalized ads? I get stats on my own behavior? I get more people and advertisers paying attention to me? I definitely am not yet interested in making it easier for others to try to influence me based on some attempt at interpreting my activity/interest – is this a matter of not just being good enough at it yet? Maybe.

Will Attention Trust make a difference? I don’t know.

I understand that some people have more intense desires to communicate everything they think and do and will buy into attention for that, but surely that is an edge group…?

Attention and its scarcity and therefore value are important to pay attention to when deveoping products or businesses – but it is not all in the user’s interest.

UPDATE:
Listened to Michael Goldhaber’s talk on the economy today at ETech. He’s the one who everyone quotes. Interesting talk, but I still don’t get it. I suppose the desire for attention might be as rational as the desire for money (although I hope not – it doesn’t seem as practical, you can’t simply bank attention over time without its value diminishing). Trading in “attention bonds” as Seth Goldstein wants, is a bit scary in that it depends on people who don’t think they get enough attention!? I thought Seth’s talk was the most enlightening on the topic.

UPDATE 2:
And this will be it for the updates. See Jon Udell’s and Doc Searls’ comments on this.

UPDATE 3:
Well, it is now 2018, and does this dated or what!

« Older posts Newer posts »

© 2026 The Gilbane Advisor

Theme by Anders NorenUp ↑