[unrev-II] Towards an Atomic Data Structure

From: Eric Armstrong (eric.armstrong@eng.sun.com)
Date: Sun Apr 23 2000 - 21:50:50 PDT

Next message: Eric Armstrong: "[unrev-II] New Schedule"

Previous message: Eric Armstrong: "Re: [unrev-II] Meeting Summary"
Next in thread: Rod Welch: "[unrev-II] Towards an Atomic Data Structure"
Reply: Rod Welch: "[unrev-II] Towards an Atomic Data Structure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Background: Object-Oriented (O-O) Terminology
---------------------------------------------
  A "class" is a template you use to construct an "object".
  The class defines the behaviors that members of the class
  are capable of, and defines the data items that distinguish
  one member of the class from another. Individual objects
  use the behaviors defined by the class, and have specific
  data values that make them (usually unique) "individuals".
  The Car class, for example, might define behaviors for
  start and stop. It also might include data types for
  model, color, and acceleration characteristics. So a Red
  Ferrari would have one set of data items, while a Green
  Edsel would have another set. But both would have
  start and stop behaviors.

Text Nodes
----------
  The fundamental unit of a DKR is an item of information.
  Since the ideal in writing is to have "one idea per
  paragraph", an "information node" can be thought of as
  a paragraph of text. Headings stand apart from other
  text, as well, so a heading is a special (short)
  paragraph, or information node.

  Node behaviors are defined in a class (object template).
  Every text node must contain an attribution -- a pointer
  to the author, or an identifying string. A copy of that
  node may be edited, which suggests the need for a split
  operation, for example. After node is split into one or
  more fragments, and edit operation could replace some
  fragments or insert new ones that have a different author.
  Some of the operations appropriate to a node might therefore
  include split, delete, replace, and insert.

  Note that when the node is split, two objects exist where
  one did before. Every node must therefore be capable of
  being the root of a subtree. Although it may start out life
  as a simple node that contains or points to an item of text,
  it must also be capable of pointing to a list of text
  elements. (That list might also include markup elements,
  like HTML bold tags: <b>.) Since each item in that list may
  itself point to a list of subitems, the resulting structure
  is a tree.

Categories
----------
  Since every node will contain some kind of content (text
  initially, but eventually media objects), it is relatively
  clear that the fundamental class should provide operations
  for split, delete, replace, and insert, and that it should
  allow for a tree-shaped substructure.

  However, nodes may also acquire other subelements, such as
  comments. So a node needs the ability to serve as the root
  of multiple subtrees.

  In addition, some categories require special behaviors and
  data structures. For example, if a node is "rated" then it
  needs the ability to acquire a subtree consisting of
  evaluations (ratings). It also needs the ability to "average"
  the evaluations it has received for presentations.

Dynamic Behavior
----------------
  Following classic O-O principles, it is tempting to construct
  separate classes for evaluations, comments, and information
  nodes. But that process moves us further away from identifying
  an "atomic" structure. More importantly, it runs into the
  problem that node-categorization (classification) is a dynamic
  process.

  If node is not "rated", then it should not be possible to add
  evaluations to it. If it is "rated", it should be possible to
  add evaluations, and to have the average of those evaluations
  summarized as the node's rating. Once a rating has been added,
  the node can no longer be unrated. But until then, the node
  could be switched at will back and forth from "unrated" to
  "rated".

  That dynamic classification poses problems for a static object
  oriented class structure. Unless some language system exists
  that allows behaviors to be added on and taken away in a
  dynamic classification process, the only alternative is to build
  all of the behaviors into the fundamental class -- making the
  identification of an "atomic" data structure all the more
  important.

Extensibility
-------------
  If the behaviors are defined in a single class, then adds to
  the system by extending that class. To put that class into
  effect, the system must be designed to create nodes using a
  "factory method". To create a new information object, then,
  you don't merely use an existing class to make one. Instead,
  you ask the factory method to create one and hand it back to
  you. Runtime parameters can then configure the "factory" to
  tell it which class to use. So, if the BasicNode class is
  the standard class for an information node, and if you create
  an ExtendedNode class, then the factory would be instructed
  (via a command line switch or configuration file) to use the
  ExtendedNode class when constructing an information object.

Versioning
----------
  The basic structure for a node, then, is that it contains a
  pointer to a previous version of itself. For versioning to
  be useful, however, it must be possible for old links to
  acquire the newest version. That requires an indirect link
  -- a "virtual node". At a minimum, then, the data structure
  must allow for two atomic types: The virtual node that points
  to the latest version, and the node itself.

  Some actions like edits, rearranging sublist items, or deleting
  those items would produce a new version. It should be possible
  to perform multiple operations of that kind without having a
  separate (persistent) version number. When "published", the
  node would have the next sequential version number, regardless
  of the number of changes. (For "undo", however, multiple
  non-persistent "revisions" would be kept, so that changes can
  be backed out. When published, all but the last revision would
  be removed.)

Data and Sublists
-----------------
  The node must contain, at a minimum, sublists (or subtrees)
  for content (text nodes), for comments (text nodes), and
  evaluations (text nodes with a rating). It may need to keep
  an author-list (people who are authorized to perform direct
  edits). It also needs a list of the categories to which it
  belongs. (Implementing categories as lists provides fast
  searching, as demonstrated by the Traction system). And it
  needs a substructure list. (For a heading, for example,
  the "content" would be the text of the heading while the
  "substructure" would be subheadings and paragraphs.)

  A node therefore contains a variety of sublists, and at least
  some data items. The data items include a rating slot (for
  rated nodes), a version identifier, and a pointer to the
  previous version. (Alternatively, one of the sublists could
  be a version list.) A reference count would also be a good
  idea, in case nodes wind up with no links at all, so they
  can be removed. A pointer to the previous revision (during
  editing) would also be needed, until the node is published.

  To make the structures extensible, the data items may well
  be kept in a tuple, where the nature of the tuple depends
  on the type of the node. (A text node, for example, would
  have a text string and an author identifier.)

Types
-----
  To make an "atomic" data structure, it would ideally be
  possible to construct a node that contains a list of
  of subtrees. Each list would be identified as, for
  example: content, structure, comment, evaluation, categories,
  and authors. Every such node would be capable of having
  its own list of sublists. Even if only one sublist was
  present, the result would be a tree.

  It should also be possible to add lists dynamically. That
  allows a "question" node to have a sublist of "alternatives",
  for example, and for each alternative to have a sublist of
  "arguments" and/or "evaluations".

  The question is: What is the best way to represent those
  types? They could be kept as a value in the node. (Then
  the list of sublists would contain pairs: type, list.)
  Another possibility is to keep them in a list header for
  each list. A third possibility is to link to them, the
  same way that a node's category sublist entries link to
  individual categories.

  Another question: Do nodes need types, as well? For example,
  the list of arguments could have arguments for and arguments
  against (pro/con, plus/minus). Or possibly the arguments
  should be kept in separate lists? But that would make
  reconstruction of the original chronological sequence
  difficult, although it would expedite plus/minus summaries.
  [Overall, it seems desirable to add "type" as a data
  item in a node.]

  If a node contains a type, and a node contains a list of
  sublists, then any node can be a list header. It only needs
  a type value that identifies it as a content list, or a
  structure list, etc.

Atomic Structure
----------------
  The basic atomic structures, then, might look like this:

     VirtualNode
        pointer to Node (most recent version)

     Node
        type
        prevVersion
        data (tuple)
        sublists

  A node of type "Info" would have multiple sublists,
  including content, structure, etc. A node with one of
  those types, on the other hand, would have only one sublist.
  So a "Content" node would have a single list containing
  "Text" (zero sublists) or "Markup" nodes (one sublist with
  zero or more entries of type Text or Markup).

------------------------------------------------------------------------
Your high school sweetheart-where is he now? With 4.4 million alumni
already registered at Classmates.com, there's a good chance you'll
find her here. Visit your online high school class reunion at:
http://click.egroups.com/1/3139/3/_/444287/_/956551848/
------------------------------------------------------------------------

Community email addresses:
  Post message: unrev-II@onelist.com
  Subscribe: unrev-II-subscribe@onelist.com
  Unsubscribe: unrev-II-unsubscribe@onelist.com
  List owner: unrev-II-owner@onelist.com

Shortcut URL to this page:
http://www.onelist.com/community/unrev-II

Next message: Eric Armstrong: "[unrev-II] New Schedule"
Previous message: Eric Armstrong: "Re: [unrev-II] Meeting Summary"
Next in thread: Rod Welch: "[unrev-II] Towards an Atomic Data Structure"
Reply: Rod Welch: "[unrev-II] Towards an Atomic Data Structure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:58:23 PDT