The Gilbane Report: Volume 10, Number 1What is an Information Model & Why do You Need One?
February 2002
Download a PDF version of this article Read the news for this issue.
What Is An Information Model & Why Do You Need One?
Ten
years ago (when we published Volume 1 Number 1) few companies did anything
with unstructured or "document" information other than print it, or
scan and store it on optical jukeboxes. Information management discipline was
almost exclusively focused on fixed-length relational data. There were only
a few companies using SGML or proprietary markup languages to build information
(typically document) models to better manage, reuse, and capitalize on their
non-relational information assets.
A lot has changed. Today
most companies are paying attention to their unstructured information, and many
are using XML technology to add some structure to it. However, the disciplines
of information analysis, design, and modeling are still often overlooked in
the rush to get the latest Website up or point-to-point application integration
implemented. This is shortsighted and guarantees expensive new development or
re-designs in the future.
This month we publish an
excerpt from a new book by JoAnn Hackos that provides Web developers with insight
into user requirements, and business managers a clear explanation of why the
effort of designing a well-thought-out information model is a critical component
of a content management strategy. JoAnn has been educating and helping companies
build successful single source information strategies for years.
JoAnn's book, Content
Management for Dynamic Web Delivery, is due out February 28th and will
be available at Amazon.
What Is An Information
Model & Why Do You Need One?
An Information Model provides
the framework for organizing your content so that it can be delivered and reused
in a variety of innovative ways. Once you have created an Information Model
for your content repository, you will be able to label information in ways that
will enhance search and retrieval, making it possible for authors and users
to find the information resources they need quickly and easily.
The Information Model is
the ultimate content-management tool.
Creating your Information
Model requires analysis, careful planning, and a lot of feedback from your user
community. The analysis takes you into the world of those who need and use information
resources every day. The planning means talking to a wide range of stakeholders,
including both individuals and groups who have information needs and who would
profit from collaboration in the development of information resources. Getting
feedback requires that you test your Information Model with members of your
user community to ensure that you haven't missed some important perspectives.
It's very easy to tell when
a Web site you're trying to navigate has no underlying Information Model. Here
are the tell-tale characteristics:
- You can't tell how to
get from the home page to the information you're looking for.
- You click on a promising
link and are unpleasantly surprised at what turns up.
- You keep drilling down
into the information layer after layer until you realize you're getting farther
away from your goal rather than closer.
- Every time you try to
start over from the home page, you end up in the same wrong place.
- You scroll through a
long alphabetic list of all the articles ever written on a particular subject
with only the title to guide you.
Sound familiar? What does
it feel like when a well-designed Information Model is in place? Oddly enough,
you generally don't notice a well-conceived Information Model because it simply
doesn't get in the way of your search.
- On the home page, you
notice promising links right away.
- Two or three clicks get
you to exactly what you wanted.
- The information seems
designed just for you because someone has anticipated your needs.
- You can read a little
or ask for more - the cross-references are in the right places.
- Right away you feel that
you're on familiar ground - similar types of information start looking the
same.
Did all of these pleasant
experiences happen by accident? Not in the least. Finding the information you
needed quickly and easily requires a great deal of advance planning. The basic
planning and design tool is the Information Model. If an Information Model is
clearly defined and firmly established, users will be on a fast track finding
and retrieving the information they need.
What is an Information
Model?
An Information Model is
an organizational framework that you use to categorize your information resources.
The framework assists authors and users in finding what they need, even if their
needs are significantly different and personal. The framework provides the basis
on which you base your publishing architecture, including print and electronic
information delivery.
An Information Model might
encompass the information resources of one part of an organization. For example,
your Information Model might provide a framework for categorizing your corporate
training materials or the technical and sales information that accompanies your
products. Your Information Model might include engineering information produced
during product development, policies and procedures used internally in the day-to-day
conduct of business, information about customers used in your sales cycle or
about vendors used in your supply chain. Some of the information resources you
bring under content management might be available across the corporation for
internal use, such as human-resources information. Other information resources
might be specific to the needs of one department or division of your organization.
As you plan what to include under content management and what to exclude, you
must consider a wide range of dimensions through which you will categorize and
label your information. Some of the dimensions will be specific to the needs
of information authors. Others will meet the requirements of your products and
services. Still others will explicitly meet the needs of internal and external
users of information.
As you design your Information
Model, consider how large an information body it must encompass. Some Information
Models are very small, specific, and limited in scope. Others stretch across
entire organizations, encompassing thousands or millions of pages.
The three-tiered structure
of an Information Model
The Information Model you
build will have a three-tiered structure. At base, the first tier of the Information
Model consists of the dimensions that identify how your information will be
categorized and labeled for both internal and external use in your organization.
The second tier sorts your information assets into information types. The third
tier provides structure for each information type, outlining the content units
that authors use to build information types. Figure 1 illustrates the three-tiered
structure. In this article, you learn how to determine the basic dimensions
of your Information Model.
Figure 1. The three-tiered
structure of an Information Model

The dimensions you identify
as the foundation of your Information Model become the attributes and values
of the metadata you will use to label your modules of content in your repository.
The information types will provide your authors with the basis for creating
well-structured modules that represent a particular purpose in communicating
information. The content units will describe the chunks of content that are
used to construct each information type.
Why do you need an Information
Model?
Designing an effective,
comprehensive Information Model is a critical and sometimes formidable step
in developing a resource that will provide answers to your customers' most arcane
questions in their search for information. A content-management system that
will make information accessible must be built upon a sound Information Model.
Otherwise, what you will
have is a loose collection of files with cryptic names, inaccessible except
to the experts. It's what you now face when you try to access a company's information
resources. Where is the information stored? How are the files named? What about
information inside the files? What if you don't know the exact titles and content?
What if the people who know the file system leave the organization?
The evidence that the existing
systems of storing information fail is quite massive. Everyone has stories to
tell of the impossibility of finding the information they need, whether the
resources are in printed volumes or in online systems. Where did you put that
government document that lists the requirements for employers of the disabled?
That policy document is 800 pages long, and the table of contents and the index
do not include the words you are using to find an answer to your question. If
you don't know what the author called it, you can't find the information you
need. If you don't know where the author put it, how are you to find it and
use it yourself?
A strong, effective Information
Model solves the problems described when it is designed in the context of a
content-management system. The model labels information according to the ways
it will be accessed. In fact, the information can be reorganized in many ways,
depending upon who is doing the looking. Most important, the model provides
the framework needed to make information accessible to experienced and inexperienced
seekers alike. It reduces frustration and enhances productivity. It means that
people spend less time searching and more time using information resources.
It helps to ensure that resources are not rewritten or recreated through an
author's sheer frustration at not being able to find them.
Static Information Models
As you begin to construct
your Information Model, you will be tempted to use the various logical (or illogical,
for that matter) categories that others originally used to set up their files
in the file servers. I find, for example, technical information that is organized
by product line. All the information associated with Model A is organized inside
file folders labeled Model A. A similar structure might be in place for Model
B of the product, or the structure might be entirely different because the people
in charge of Model B don't communicate about organizational schemes with the
people in charge of Model A. Note that Figure 2 illustrates a typical hierarchical
arrangement used in a file-management system. A hierarchical Information Model,
using a system of folders and files, works well as long as everyone in the organization
understands the design.
Figure 2. A hierarchical
information structure is typically used in a file-management system, such as
the hierarchical view of folders and files in Windows Explorer. This static
organization is useful as long as its use is restricted to people who understand
the information and the organizational logic of the Information Model embodied
in the hierarchical design.

Functional departments within companies use categories and organize them in
ways that reflect the ways in which the resident experts conduct business. The
Human Resources Department, for example, might organize its information resources
into categories such as employee benefits, employee demographic information,
and so on. The electronic filing system reflects how the experienced people
in the organization think about the information. Introduce just one newcomer,
and confusion results.
In devising an Information
Model for dynamic Web delivery, you need to resist the temptation to create
a static system where there is only one way to find a particular piece of information.
Although the system is usable by experts, newcomers and outsiders will be defeated.
Dynamic Information Models
The solution to a static
representation of your information resources is a dynamic Information Model,
one that changes in response to the needs of the users. Take the library, for
example; what if a library patron could wave a magic wand and the library would
rearrange itself in response to a particular set of needs. Let's say that a
patron wants to find not only all the books written by Steven King but also
all the books and articles written about Steven King. In addition, the patron
would like to know more about mystery and horror writers in the second half
of the twentieth century living in North America or in the United Kingdom. Once
the patron's need is known, all the books in the library and the articles in
all the periodicals fly around rearranging themselves into an optimal solution.
Sounds a bit like a magical library; rather messy, I'm afraid. But it would
be a godsend for the individual user.
For those of us delivering
information through electronic media, the danger of being hit by flying books
and periodicals ripping apart can happily be avoided. If you have studied your
users and worked hard to anticipate their needs, or put in place systems to
continually monitor their searches, you can quite literally rearrange the library.
The Information Model is the mechanism that makes dynamic updating of the information
possible. But the Information Model is only as good as your analysis and creativity
can make it.
Defining the components
of the Information Model
The Information Model consists
of information resources that you have categorized so that they can be effectively
searched and retrieved. The categories reflect your understanding of the dimensions
that represent the points of view of each relevant group in the user community.
For each of the dimensions you establish, you assign labels (most likely in
the form of XML metadata tags) that describe each information resource in terms
of the relevant categories and subcategories.
Look at how the cookbook information designer might develop an Information Model
for a cookbook content-management system. Table 1 consists of two columns: the
first describes the primary dimensions and the second lists the individual instances
of the category - the subcategories - that might be found in your information
resources. Because the Information Model is based on an analysis of how users
might want to find the information they need, the more comprehensive the user
analysis, the more successful the dimensions will be. The dimensions become
your XML metadata attributes and the subcategories are the values associated
with the metadata attributes.
Table 1. Recipe metadata
attributes and values
| Metadata
Attribute (Dimension) |
Value
(subcategory) |
| Primary
recipe ingredient |
Beef |
| Lamb |
| Chicken |
| Fish |
| Shellfish |
| Vegetables |
| Ethnicity |
Italian |
| Mexican |
| Chinese |
| Irish |
| Thai |
| Vietnamese |
| Role in a meal |
Starters |
| Soups |
| Sandwiches |
| Salads |
| Main courses |
| Side dishes |
| Desserts |
| Special
diets |
Low fat |
| Low salt |
| Low calorie |
| Low cholesterol |
The metadata attributes
and values that are embedded in each information module make it possible for
the person searching for a felicitous menu to come up with a Chinese main course
featuring fish and accommodating a low fat diet. Another person searching for
a Vietnamese soup would also be successful.
Not only would users be
able to gather and rearrange the information to suit their requirements, but
the information developers would also have many ways to organize the information.
A developer wanting to produce a low-fat Chinese cookbook would be able to find
all the appropriate Chinese recipes and arrange them by meal or primary ingredient
or Chinese region and so on. Table 2 illustrates how a database table might
be organized to represent the dimensions identified for the recipe database.
Table 2. This table illustrates
how a database might be organized to accommodate the recipe metadata attributes
and values that were identified in Table 1.
| Title |
Ethnicity |
Special
Diet |
Role |
| Spring
Rolls |
Chinese |
Low fat |
Starter |
| Cannelloni
|
Italian |
Low salt
|
Side dish |
| Taco
Salad |
Mexican |
Low calorie |
Salad |
| Shepherds
Pie |
Irish |
Low cholesterol
|
Main course |
While you are gathering
information that will guide the development of your Information Model is the
best time to consider many possible organizational schemes. For example, the
technical documentation supporting a company's hardware products might be organized
by product model number. Or, the technical documentation might be arranged according
to the basic sets of tasks from installation through configuration, standard
use, troubleshooting, and maintenance. Still other organizational schemes might
reflect the job skills required to install, use, and maintain the equipment
or the level of expertise of an individual within a particular job classification
(expert, journeyman, beginner).
The primary categories that
describe your information resources will be related to specific information
types and content units. An information type might be a procedure consisting
of step-by-step instructions for installing a hardware device. Content units
for the procedure might list the tools required for the installation, the warnings
about taking proper safety precautions, observations on handling typical installation
problems, or recommendations for setting up the workplace for safe and efficient
use.
Analyzing user requirements
Information architects at
Nortel Networks decided to organize its information according to three interrelated
primary dimensions: workflow, product model, and information type.
They began by analyzing
the types of work done by the end-users of their products. They learned that
people planned for the installation of the new equipment they needed, wrote
specifications and evaluated products, installed and configured their new hardware
and software, upgraded existing hardware and software with new versions, monitored
error reports, engaged in troubleshooting activities, and repaired and replaced
components and software applications.

Figure 3. Nortel's information
categorization and labeling. Source:
Nortel Networks, Inc. Permission to use granted by Nortel Networks, Inc. All
rights reserved.
The Information Model that
they developed began with a dimension that allowed them to label information
topics among their information modules with values that represented the end-user's
workflow. Superimposed upon the workflow dimension was the product model that
the end-users were working with. That meant, for example, that one topic developed
in the information resource would have two initial labels: one for the place
in the customer's workflow and another for the product model. A specific topic
might be concerned with hardware installation for computer model A.
The third dimension involved
the types of information to be found among the topics in the information resource.
Some topics described background reference information that users could read
to understand how the particular product model worked. Other topics contained
procedures for installation or configuration or monitoring. Still other topics
described tools required, troubleshooting recommendations, or safety warnings.
The basic categorization and labeling they defined in a comprehensive, corporate-wide
Information Model is illustrated in Figure 3.
If this Information Model
were used simply to store or find a module in a content-management system and
always used in the same context, you would attach one label to the topic for
each category. For example, a procedure for troubleshooting a problem with Computer
B would be given three labels as illustrated in Table 3.
<product model = "computer
B">
<procedure workflow = "troubleshooting">
<information type = "procedure">
However, the same procedure
might be applicable to more than one product model. In that case,
<product model = "computer
A, computer C, computer F, and so on">
A particular safety warning
might be applicable to workflow activities involved with installing, replacing,
and upgrading a hardware component. As a result, the workflow dimension of a
particular information topic would include the following:
<procedure workflow =
"installation, replacement, upgrading">
By labeling information
in multiple ways, a particular topic of information will appear in more than
one context, depending upon the needs of the users of the information.
Table 3. Metadata attributes
and values for the workflow dimension
| Metadata
Attribute (Dimension) |
Value
(subcategory) |
| Workflow |
Planning |
| Specification |
| Installation |
| Configuration |
| Monitoring |
| Troubleshooting |
| Maintenance |
| Replacement |
| Upgrading |
| Product
Model |
Computer
A |
| Computer
B |
| Computer
X |
| Information type |
Safety
warnings |
| Hints |
| Procedures |
| Background
information |
| Concepts |
| Tools required |
Assessing authoring requirements
To be most effective, an
Information Model is first focused on the users of the information. However,
other dimensions of the information emerge when you study the requirements of
the authors.
Information authors also
need to create, store, find, and reuse information topics as they develop information
resources. In general, the authoring community wants to know
- Who first authored a
topic?
- When was it first written?
- Who edited the information
topic and when?
- What changes have been
made to the topic, by whom, and when?
- Why were the changes
made?
- Are there many versions
of the information topic, reflecting a series of changes?
- Which version am I viewing
at this time?
- When was it created?
- Who approved the information
topic for publication to the Web?
- When was it approved?
These questions reflect
the work processes of the authoring community. They can be expressed as dimensions
and values associated with each information topic created and stored in the
content-management system. Authoring requirements mean that you have additional
labels to attach that support creating, storing, searching, and retrieving information
from the system.
A table of authoring workflow
requirements might look like Table 4.
Table 4. Dimensions of
an authoring workflow Information Model with values
| Metadata
Attribute (Dimension) |
Value
(subcategory) |
| Author |
Individual
name |
| Editor |
Individual
name |
| Activity |
Initial
creation |
| Editing |
| Approval |
| Revision |
| Activity date |
Date |
| Version |
Number |
| Date |
| Reason for change |
Notes |
Many authoring dimensions
can be automatically assigned to an information topic. For example, you know
that John Jones is the author of a particular topic because John logged onto
the content-management system using his password. You know the date that he
first authored the topic based on the date stored in the system. You also know
that John is the author, not the editor of the topic. However, the only way
you can tell why John revised his information topic two weeks after he first
created it is by looking at the notes he included when he made the change and
checked the topic back into the content-management system.
Other information that tracks
the authoring processes is based upon a workflow system that can be configured
to route information topics from author to editor to approver. The individual's
role in the process is defined in the workflow system, and the workflow system
automatically selects the appropriate category and label to use.
Version and release control
requirements
Keeping track of versions
of the information is a standard part of a content-management system. Each time
an author makes a change and checks a topic back into the database, a new version
is created. If the author explains what change was made and why in a note, then
even more information is available to anyone tracking the changes.
Version information included
in the Information Model helps the development community ensure that the latest
version is being released to the users and that earlier versions are available
whenever they need to go back. This practice adds up to straightforward version
control.
However, you have learned
that, in many organizations, information changes in ways that are not always
neatly predictable by simple versioning. Companies have multiple versions of
products available to customers at any one time. Information related to earlier
models might all need to be available simultaneously. For example, many heavy
equipment manufacturers make available on their Web sites the maintenance manuals
for 25, 50, or even 100 years of products. Somewhere, some place, someone might
need to repair one of those original pieces of equipment.
Even more complicated for
your Information Model are topics that are associated with interim product releases.
A new release of your products might be planned for the end of the quarter.
But during the development life cycle, it isn't always clear which version of
the product and which functions will end up included in the final release. Sometimes
small, interim releases are made to test functionality; these require documentation.
However, the information in the topic modules continues to change as feedback
comes from the customers, the test team, and the developers. I've learned of
groups that maintain as many as nine different versions of the information during
the development process. Some changes can affect all versions, while other changes
affect only some of the versions. Handling multiple versions of the same or
nearly the same information modules is a challenge to your Information Model.
Product developers, particularly
software developers, have opted to release updates to product functionality
more frequently than ever before. Some companies release information every three
months, others every few weeks, still others weekly. In addition, these same
organizations maintain multiple versions of the product and the information
during development. Only when release decisions are made is it clear which versions
of the many topics will actually be released to the customers.
Multiple streaming releases
have caused considerable stress to information-development organizations because
the changes in information are difficult to track. However, you can use your
Information Model to bring some semblance of control to the release process
through the use of categories and labels and through the relationship of parent-child
topics to one another.
When you need to keep track
of versions of information, you might want to split off a particular version
and add new labels. Let's look at an example illustrated in Figure 4.
Figure 4. During the
development of a procedure, several interim versions are produced. Eventually
version 1.2 splits into two related versions that reflect two ways in which
the produce might work. Authors must track the related sub-versions to ensure
that subsequent changes in the information are reflected in both instances.

These two versions are related
to version 1.2 but have differences that reflect two possible methods of talking
about a product. The first method reflects one possible way the product can
work; the second method reflects another possible way. One of the methods will
be released eventually but during the authoring process, you need to keep track
of both. You need a process of categorization that allows for a relationship
among the sub-versions. You want to keep track of the changes to the primary
version (Version 1.2) that can affect both the sub-versions. But you also want
to maintain the distinctions between the sub-versions, at least until a decision
is made about which one to release.
To handle this relationship,
you need a system of categorizing versions that links the sub-versions to the
primary version but continues to track changes to the sub-versions as well as
changes to the primary one. By developing a series of dimensions and values
that allow us to label the sub-versions and track the relationships, you can
provide a way to handle potentially complex interrelationships among topics.
The architect of a comprehensive
Information Model, in this case as in others, must be aware of the requirements
both of the user and of the authoring communities. Both communities have roles
that will influence the design of your Information Model. In my book I provide
additional detailed guidance on how to build an Information Model.
JoAnn T.
Hackos
joann.hackos@comtech-serv.com
|