[unrev-II] DKR: Document Management Requirements, v0.1

From: Eric Armstrong (eric.armstrong@eng.sun.com)
Date: Tue Jan 25 2000 - 19:51:17 PST


From: Eric Armstrong <eric.armstrong@eng.sun.com>

Overview
=======
This is a lengthy document aimed at adducing the
requirements for a subset of an eventual Dynamic
Knowledge Repository (DKR). The subset described
is for a Collaborative Document System (CDS). The
goal of this document is to show how such a system
fits into a DKR framework, and detail its requirements.

The version number of this document (v0.1) represents
the early stage of the process.

This document has the following sections:
  * Long-Range Goals
  * Motivation
  * Starting Points
  * General Characteristics
  * Operational Requirements
  * Summary of Data Structure Requirements
  * Future: Using an Abstract Knowledge Representation

Long-Range Goals
=============
A fully functional DKR will need to manage many different
kinds of things:
   * documents
   * abstract knowledge representation and inference engines
   * predictive models
   * multimedia objects
   * programs of various kinds (search engines, applets)

It is likely, too, that different kinds of problem will required
information to be organized in fundamentally different ways.
For example, a DKR devoted to the energy problem might
have major headings for the problem statement, real world
data, tactical possibilities, strategic alternatives, and predictive
models. On the other hand, a DKR devoted to building the
next-generation DKR might have sections for requirements,
design, implementation, testing, bug reports, suggestions,
schedules, and future plans.

Since the general outline of a DKR seems to depend on the
problem domain it is targeted for, it seems reasonable to
focus attention on the elements they have in common.

This set of requirements will focus on what is perhaps the
major common feature: Documents -- in particular,
Collaborative Documents.

Other important areas that will need attention include the
integration of multimedia objects (including animations,
simulations, audio, video, and the like) as well as the
critical functions of abstract knowledge representation,
inference engines, model-building functions, and the
integration of other executable programs. But here, we'll
focus on Collaborative Documents.

Motivation
=======
A wide variety of email and forum-based discussions occur
on a host of topics every day. In each of these discussions,
important information frequently surfaces, but that information
is hard to capture where you need it. .

Document production systems, on the other hand, simplify the
task of creating complex documents but make it hard to gather
and integrate feedback.

For example the DKR discussions have identified several possible
starting points for such a system. That kind of feedback occurs
naturally in an email system, as opposed to a document production
system, but each of the pointers was buried in a separate email.
It required lengthy search to gather them together (below), and the
list may not even be complete!

To act as a foundation for a DKR, a Collaborative Document System
(CDS?) needs to combine the best features of:
  * Directory tree / outlining programs
  * Hypertext (links and formatting)
  * XML (inline references and other features)
  * Email systems
  * Forums and Email Archives
  * Document Database
  * Versioning Systems
  * Difference Engines
  * Search Engines

Starting Points
==========
In the DKR discussion, we've seen pointers to several possible
starting points for such a system:
   * Augment/NLS
      Note: We don't need the app, but Augment's requirements
      documents would be *highly* desirable.
   * IBIS
   * Star/Rose
   * GEDO
      http://www.infoloom.com/
      http://www.topicmaps.com/
   * Early XML Drafts
      http://????

Plus a variety of other possibilities suggested by Roy Roebuck:
  * ISO/IEC 13250:1999 TopicMap standard
  * General object model
     http://omg.org
  * Force's (DMTF) Common Information Model's (CIM) metaschema
     http://www.dmtf.org/spec/cim_spec_v22/#_Toc453584954)
  * LDAP Object Model
     http://www.ldapcentral.com/
  * XML/XLL/XSL and XMI
     http://www.w3.org/XML/
  * InfoMap Multicentric Information Map
     http://www.multicentric.com/
  * Artificial Brain
     http://www.thebrain.com
  * A MindMan MindMap
     www.mindmanager.com
  * Interchange between existing knowledge tools/stores using XML.
      http://www.ms.lt/
  * General Enterprise Management (GEM) tree underlying GEDO
     a hierarchy for namespace and categorization management
     http://one-world-is.com/rer/owis/dem/slides/img034.gif

General Characteristics
================
The lengthy list above, the difficulty of creating it, and the
rapidity with which it will go out of date, several requirements
for the DKR suggest themselves immediately. In particular,
it needs to be composed of information nodes that are
hierarchical, mailable, linkable and evaluable (more on those
subjects in a moment).

Each of those requirements leads in turn to other requirements.
The major requirements are listed here and explained below:
  * Hierarchical
  * Revisable
  * Versionable
  * Mailable
  * Distributed
  * Administratable
  * Differencable
  * Linkable
  * Evaluable
  * Collaborative
  * Attributive

In addition, the system should be:
  * Open
  * Extensible
  * Secure

The remainder of this section discusses those requirements in
greater detail.

Hierarchical
------------
This message should exist in outline form. It should be
easy to add and remove entries to the list of starting points
as more information is gained. However, the hierarchy should
function using XML-sytle "entity references" that copy the
target contents into the displayed document, "inline". The result
is effectively a lattice of information nodes.

Revisable
----------
Although "hard" links to objects will be needed at times, in most
cases the link to the "Requirements Document" should be a
"soft" link -- that is, an indirect link that points to the latest
version.
That means never having to worry about looking at an old version
of the spec.

Versionable
------------
Each node in the hierarchy needs to be versioned, so that previous
information is available. In addition, the task of displaying
differences
becomes essentially trivial.

Mailable
---------
It must be possible to "publish" the whole document or sections of it
by "posting" it. It must also be possible to create replies for
individual
sections, and then "post" them all at one time.

Distributed
-----------
Rather than using a central "repository", the system should employ
the major strengths of email systems, namely: fast access on local
systems and the robust nature of the system as a result of having
redundant copies on many different systems. The system will be
more space intensive than email systems, but storage costs are
dropping precipitously, future technologies are even brighter.

Administratable
----------------
To mitigate the short-term need for storage space, it should be
possible to set individual storage policies. For example, a user
will most likely not want to keep previous versions of any
documents they are not personally involved in authoring. It must
also be possible to add names to the authoring list. Name
removal should probably be limited to the original author. For
those cases when the original author is no longer part of the system,
it should be possible to make a copy of the document and name
a new primary author.

Differencable
--------------
When a new version of a document arrives, differences are
highlighted. Old-version information becomes accessible through
links (if saved). Differences are always against the last version
that was visited. If a section of the document was never visited,
the most recent version of that section is displayed on the first
visit. If several iterations have taken place since the last visit,
the cumulative differences are shown. (Again, node-versioning
makes this user-friendly feature fairly trivial.)

Linkable
---------
Clearly support for web links is desirable, as shown by the links
to the various possible starting points above. [Note: Each of
those should be evaluated against this requirements list, and used
to modify these requirements.]

Evaluable
----------
The many possible starting points above highlights the need for
evaluablility. It should be possible, not only to reply with a comment
on any item in those lists, but also to add an evaluation, much as
Amazon.com keeps evaluations for books. That feature is arguably
their greatest contribution to ecommerce, and the DKR should make
use of it. It should also be possible to order list items using relative

evaluations. That lets the most promising starting point float to the
top of the list. Not all lists should be ordered by evaluation, however!

For example, the sequence of requirements has been chosen to
provide the most natural "bridge" from one to the next! So evaluability
must be an option.

Collaborative
-------------
The system must increase the ability of multiple people, working
collaboratively, to generate up to date and accurate revisions.

For any given document, there are several classes of interaction:
  * receive
  * comment
  * suggest
  * author

The first group consists of people who receive the document and
do nothing else with it. (Just trying to be complete here.) The second
group consists of people who send back comments on different
sections. That feedback will typically be used in future versions.

The 3rd group consists of people who suggest an alternative wording
or organization. Those "suggestions" take the form of a modified copy
of the original. One of the document authors may then agree to use
that formulation in place of the original, or may simply keep it as
commentary.

The 4th group consists of the fully-collaborative authoring group.
The original author must be able to add other individuals to the
document, or to subsections of it. (An author registered for a given
node has authoring privileges throughout the hierarchy anchored
at that node.)

Attributive
----------
Every information node that is created should be automatically
attributed to it's author. When a new version of a node is created,
all of the people who sent comments should be contained in a
"reviewer" list. When a suggestion is accepted, the author of the
suggested node should go into a "contributor" list in the parent node
and be added to the "author" list for the current node. It should be
possible to identify all of the reviewers, contributors, and authors
for the whole document and for each section of it.

Open
------
The system must be "open" in the sense that a user is not constrained
to using a particular editor, email system, or central server. The
specifications for interaction with the system should be freely
available,
along with a reference implementation to use as a basis. As much
as possible, conformance with existing standards (XML, XHTML,
HTTP, email) is desirable. (The tricky decisions, of course, will be
between required features and standard protocols that don't support
them.)

Extensible
----------
The server and client systems that implement the DKR must also
be fully *extensible*. In other words, the same characteristics
of hierarchy, versioning, and revisability (use of most recent
version) that apply to the documents must apply to the system
itself.

That extensibility can be accomplished with a "dispatch table"
that names the class to use for each kind of object that needs
to be created. In conjunction with open sourcing, that
architecture allows a user to extend (subclass) an existing class
and then use the extended version in place of the original.
In addition, upgrades can occur dynamically, while the system
is in operation, while allowing for modular downgrades when
extensions don't work out.

Secure
=====
Security in such a system becomes an issue, unfortunately. The
system should employ whatever mechanisms exist or can be
constructed to help prevent trojan horse attacks, back door
attacks, and other security breaches in an open source system.

Operational Requirements
==================
What follows is an outline of functional operations for the
system:
  * Editing
      --Add, change, delete, move nodes
      --Copy nodes
          ..node alone, current-version subtree, whole subtree
      --Link (indirect, "soft" links, and direct "hard" links)
      --Automatic versioning
      --Automatic attribution
  * Email
      --Post
         ..Increment version number for future edits
         ..Deliver to group via server
      --Receive
         ..Automatically diff against last visited version of each node
         ..Highlight diffs
         .."Go to next unread" feature
  * Phantom Nodes
     --Since it is possible to receive comments on nodes that have been
        deleted from the current (not yet published) draft, the system
must
        maintain "phantom" nodes that can be used to collect such
comments.
        Phantom nodes are invisible until a comment is received, and
        disappear once the current version is posted. The comments
        themselves are always stored under the original node.
  * Trash Bin
     --Each node needs a trash bin that collects nodes which are deleted

        from under it. Trash bins are never emptied, except by explicit
        action requiring multiple confirmations.
   * Distributed Editing Control
      --The comment/version-publishing system means that locks are not
         required for single-author documents. But for multiple authors
to
         collaborate, it must be possible to prevent editing conflicts.
      --One possibility is to implement distributed locks. The major
issue
         there is handling communication outages.
      --An equally viable possibility may be to allow simultaneous edits

         and detect their occurrence when a new version is received. The

         competing versions can then be displayed side-by-side with
user-
         selectable merge options.
     --Detection of competing versions may require something other than
        simple version numbers.

Summary of Data Structure Requirements
============================
Each node in the system should be able to track the following
information:
  * Version-identifier
  * Author list
  * Contributor list
  * Reviewer list
  * Evaluation list
  * Evaluation summary
  * Distributed Lock (unless Competing Versions is chosen)
  * Trash Bin
  * isPhantom identifier

Future: Using an Abstract Knowledge Representation
====================================
A hierarchical system is created from only two relationships:
   * Containment
   * Ordering

If progress is made in the pursuit of abstract knowledge
representations,
it may be that the whole of collaborative document system may well
migrate into a knowledge representation, using those two relationships.
The document management system would then be a subset of a much
larger knowledge management repository.

One wonders what such a system will look like after it begins
to be extended with thousands of additional relationships.

It boggles the mind.

--------------------------- ONElist Sponsor ----------------------------

FREE ADVICE FROM REAL PEOPLE! Xpertsite has thousands of experts who
are willing to answer your questions for FREE. Go to Xpertsite today
and put your mind to rest.
<a href=" http://clickme.onelist.com/ad/XpersiteCPC ">Click Here</a>

------------------------------------------------------------------------

Community email addresses:
  Post message: unrev-II@onelist.com
  Subscribe: unrev-II-subscribe@onelist.com
  Unsubscribe: unrev-II-unsubscribe@onelist.com
  List owner: unrev-II-owner@onelist.com

Shortcut URL to this page:
  http://www.onelist.com/community/unrev-II



This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 18:56:40 PDT