[unrev-II] Towards an Atomic Data Structure

From: Rod Welch (rowelch@attglobal.net)
Date: Mon Apr 24 2000 - 00:04:15 PDT

  • Next message: Jon Winters: "Re: [unrev-II] License Model: Preliminary Suggestion (fwd)"

    Eric,

    Good work.

    It would be helpful to see a mock up and/or scenario of how this capability
    would be used to perform daily work, like write some code, conduct a meeting,
    make a phone call, read a book, design a computer chip, fix the car, go to the
    dentist, the normal activity people use "intelligence" to support in generating
    knowledge about the world that might be useful in a repository.

    There has been consideration for the DKR team to create a tool to help software
    engineering. The things you describe today seem like they might be useful for
    that task, but also help other tasks, as well. Does this suggest that
    augmenting "intelligence", which you mention elsewhere, provides an underlying
    capability that can help everyone do almost anything a little better?

    Rod.

    Eric Armstrong wrote:

    > Background: Object-Oriented (O-O) Terminology
    > ---------------------------------------------
    > A "class" is a template you use to construct an "object".
    > The class defines the behaviors that members of the class
    > are capable of, and defines the data items that distinguish
    > one member of the class from another. Individual objects
    > use the behaviors defined by the class, and have specific
    > data values that make them (usually unique) "individuals".
    > The Car class, for example, might define behaviors for
    > start and stop. It also might include data types for
    > model, color, and acceleration characteristics. So a Red
    > Ferrari would have one set of data items, while a Green
    > Edsel would have another set. But both would have
    > start and stop behaviors.
    >
    > Text Nodes
    > ----------
    > The fundamental unit of a DKR is an item of information.
    > Since the ideal in writing is to have "one idea per
    > paragraph", an "information node" can be thought of as
    > a paragraph of text. Headings stand apart from other
    > text, as well, so a heading is a special (short)
    > paragraph, or information node.
    >
    > Node behaviors are defined in a class (object template).
    > Every text node must contain an attribution -- a pointer
    > to the author, or an identifying string. A copy of that
    > node may be edited, which suggests the need for a split
    > operation, for example. After node is split into one or
    > more fragments, and edit operation could replace some
    > fragments or insert new ones that have a different author.
    > Some of the operations appropriate to a node might therefore
    > include split, delete, replace, and insert.
    >
    > Note that when the node is split, two objects exist where
    > one did before. Every node must therefore be capable of
    > being the root of a subtree. Although it may start out life
    > as a simple node that contains or points to an item of text,
    > it must also be capable of pointing to a list of text
    > elements. (That list might also include markup elements,
    > like HTML bold tags: <b>.) Since each item in that list may
    > itself point to a list of subitems, the resulting structure
    > is a tree.
    >
    > Categories
    > ----------
    > Since every node will contain some kind of content (text
    > initially, but eventually media objects), it is relatively
    > clear that the fundamental class should provide operations
    > for split, delete, replace, and insert, and that it should
    > allow for a tree-shaped substructure.
    >
    > However, nodes may also acquire other subelements, such as
    > comments. So a node needs the ability to serve as the root
    > of multiple subtrees.
    >
    > In addition, some categories require special behaviors and
    > data structures. For example, if a node is "rated" then it
    > needs the ability to acquire a subtree consisting of
    > evaluations (ratings). It also needs the ability to "average"
    > the evaluations it has received for presentations.
    >
    > Dynamic Behavior
    > ----------------
    > Following classic O-O principles, it is tempting to construct
    > separate classes for evaluations, comments, and information
    > nodes. But that process moves us further away from identifying
    > an "atomic" structure. More importantly, it runs into the
    > problem that node-categorization (classification) is a dynamic
    > process.
    >
    > If node is not "rated", then it should not be possible to add
    > evaluations to it. If it is "rated", it should be possible to
    > add evaluations, and to have the average of those evaluations
    > summarized as the node's rating. Once a rating has been added,
    > the node can no longer be unrated. But until then, the node
    > could be switched at will back and forth from "unrated" to
    > "rated".
    >
    > That dynamic classification poses problems for a static object
    > oriented class structure. Unless some language system exists
    > that allows behaviors to be added on and taken away in a
    > dynamic classification process, the only alternative is to build
    > all of the behaviors into the fundamental class -- making the
    > identification of an "atomic" data structure all the more
    > important.
    >
    > Extensibility
    > -------------
    > If the behaviors are defined in a single class, then adds to
    > the system by extending that class. To put that class into
    > effect, the system must be designed to create nodes using a
    > "factory method". To create a new information object, then,
    > you don't merely use an existing class to make one. Instead,
    > you ask the factory method to create one and hand it back to
    > you. Runtime parameters can then configure the "factory" to
    > tell it which class to use. So, if the BasicNode class is
    > the standard class for an information node, and if you create
    > an ExtendedNode class, then the factory would be instructed
    > (via a command line switch or configuration file) to use the
    > ExtendedNode class when constructing an information object.
    >
    > Versioning
    > ----------
    > The basic structure for a node, then, is that it contains a
    > pointer to a previous version of itself. For versioning to
    > be useful, however, it must be possible for old links to
    > acquire the newest version. That requires an indirect link
    > -- a "virtual node". At a minimum, then, the data structure
    > must allow for two atomic types: The virtual node that points
    > to the latest version, and the node itself.
    >
    > Some actions like edits, rearranging sublist items, or deleting
    > those items would produce a new version. It should be possible
    > to perform multiple operations of that kind without having a
    > separate (persistent) version number. When "published", the
    > node would have the next sequential version number, regardless
    > of the number of changes. (For "undo", however, multiple
    > non-persistent "revisions" would be kept, so that changes can
    > be backed out. When published, all but the last revision would
    > be removed.)
    >
    > Data and Sublists
    > -----------------
    > The node must contain, at a minimum, sublists (or subtrees)
    > for content (text nodes), for comments (text nodes), and
    > evaluations (text nodes with a rating). It may need to keep
    > an author-list (people who are authorized to perform direct
    > edits). It also needs a list of the categories to which it
    > belongs. (Implementing categories as lists provides fast
    > searching, as demonstrated by the Traction system). And it
    > needs a substructure list. (For a heading, for example,
    > the "content" would be the text of the heading while the
    > "substructure" would be subheadings and paragraphs.)
    >
    > A node therefore contains a variety of sublists, and at least
    > some data items. The data items include a rating slot (for
    > rated nodes), a version identifier, and a pointer to the
    > previous version. (Alternatively, one of the sublists could
    > be a version list.) A reference count would also be a good
    > idea, in case nodes wind up with no links at all, so they
    > can be removed. A pointer to the previous revision (during
    > editing) would also be needed, until the node is published.
    >
    > To make the structures extensible, the data items may well
    > be kept in a tuple, where the nature of the tuple depends
    > on the type of the node. (A text node, for example, would
    > have a text string and an author identifier.)
    >
    > Types
    > -----
    > To make an "atomic" data structure, it would ideally be
    > possible to construct a node that contains a list of
    > of subtrees. Each list would be identified as, for
    > example: content, structure, comment, evaluation, categories,
    > and authors. Every such node would be capable of having
    > its own list of sublists. Even if only one sublist was
    > present, the result would be a tree.
    >
    > It should also be possible to add lists dynamically. That
    > allows a "question" node to have a sublist of "alternatives",
    > for example, and for each alternative to have a sublist of
    > "arguments" and/or "evaluations".
    >
    > The question is: What is the best way to represent those
    > types? They could be kept as a value in the node. (Then
    > the list of sublists would contain pairs: type, list.)
    > Another possibility is to keep them in a list header for
    > each list. A third possibility is to link to them, the
    > same way that a node's category sublist entries link to
    > individual categories.
    >
    > Another question: Do nodes need types, as well? For example,
    > the list of arguments could have arguments for and arguments
    > against (pro/con, plus/minus). Or possibly the arguments
    > should be kept in separate lists? But that would make
    > reconstruction of the original chronological sequence
    > difficult, although it would expedite plus/minus summaries.
    > [Overall, it seems desirable to add "type" as a data
    > item in a node.]
    >
    > If a node contains a type, and a node contains a list of
    > sublists, then any node can be a list header. It only needs
    > a type value that identifies it as a content list, or a
    > structure list, etc.
    >
    > Atomic Structure
    > ----------------
    > The basic atomic structures, then, might look like this:
    >
    > VirtualNode
    > pointer to Node (most recent version)
    >
    > Node
    > type
    > prevVersion
    > data (tuple)
    > sublists
    >
    > A node of type "Info" would have multiple sublists,
    > including content, structure, etc. A node with one of
    > those types, on the other hand, would have only one sublist.
    > So a "Content" node would have a single list containing
    > "Text" (zero sublists) or "Markup" nodes (one sublist with
    > zero or more entries of type Text or Markup).
    > -------------------------------------------------------------------------------
    >
    > -------------------------------------------------------------------------------
    > Community email addresses:
    > Post message: unrev-II@onelist.com
    > Subscribe: unrev-II-subscribe@onelist.com
    > Unsubscribe: unrev-II-unsubscribe@onelist.com
    > List owner: unrev-II-owner@onelist.com
    >
    > Shortcut URL to this page:
    > http://www.onelist.com/community/unrev-II



    This archive was generated by hypermail 2b29 : Mon Apr 24 2000 - 00:10:47 PDT