Re: [unrev-II] Towards an Atomic Data Structure

From: Sandy Klausner (klausner@cubicon.com)
Date: Mon Apr 24 2000 - 11:22:21 PDT

  • Next message: Eric Armstrong: "[unrev-II] MySQL License"

    >Eric Armstrong wrote:
    > Background: Object-Oriented (O-O) Terminology
    > ---------------------------------------------
    > A "class" is a template you use to construct an "object".
    > The class defines the behaviors that members of the class
    > are capable of, and defines the data items that distinguish
    > one member of the class from another. Individual objects
    > use the behaviors defined by the class, and have specific
    > data values that make them (usually unique) "individuals".
    > The Car class, for example, might define behaviors for
    > start and stop. It also might include data types for
    > model, color, and acceleration characteristics. So a Red
    > Ferrari would have one set of data items, while a Green
    > Edsel would have another set. But both would have
    > start and stop behaviors.
    >
    Traditional object-oriented languages do not make a clear distinction
    between objects and types. Both 'template' forms include the notion of
    encapsulated data structure and method behavior, but they deal with two
    distinct levels of abstraction. An object structure may consist of molecular
    data represented by types. Molecular data is different from atomic data
    which is limited to logic, number, and symbolic information. Atomic data is
    accessed directly from method operations, whereas molecular data can only be
    accessed through the public interface of the type. A molecular type may
    contain a flat data structure, i.e. time/date. A molecular type may
    alternatively contain a composite data structure consisting of linked nested
    nodes. Each node can consist of a collection of instances of either typed
    data structures or untyped text. These complex abstractions could represent
    everything from markup text to rich multimedia representations.
    >
    > Text Nodes
    > ----------
    > The fundamental unit of a DKR is an item of information.
    > Since the ideal in writing is to have "one idea per
    > paragraph", an "information node" can be thought of as
    > a paragraph of text. Headings stand apart from other
    > text, as well, so a heading is a special (short)
    > paragraph, or information node.
    >
    > Node behaviors are defined in a class (object template).
    > Every text node must contain an attribution -- a pointer
    > to the author, or an identifying string. A copy of that
    > node may be edited, which suggests the need for a split
    > operation, for example. After node is split into one or
    > more fragments, and edit operation could replace some
    > fragments or insert new ones that have a different author.
    > Some of the operations appropriate to a node might therefore
    > include split, delete, replace, and insert.
    >
    > Note that when the node is split, two objects exist where
    > one did before. Every node must therefore be capable of
    > being the root of a subtree. Although it may start out life
    > as a simple node that contains or points to an item of text,
    > it must also be capable of pointing to a list of text
    > elements. (That list might also include markup elements,
    > like HTML bold tags: <b>.) Since each item in that list may
    > itself point to a list of subitems, the resulting structure
    > is a tree.
    >
    The SGML community has a long history of developing ways to markup
    documents to capture semantic knowledge embedded in strings. As processing
    requirements become more sophisticated, new ways of managing this complexity
    need to be developed. One possible solution is to move to a clear document
    model. This molecular data type model separates concerns by parsing the
    clear text from the markup information. The clear text is parsed into a base
    collection of linked character nodes, while one or more composite structure
    processors maintain position and range links into the base collection. Each
    processor may have specialized behavior to analyze and hold semantic
    information on format, organization, navigation, narrative, reference,
    graphic control, publication, and filters. The model must be able to allow
    clear text editing while automatically maintaining the processor links into
    the base collection.
    >
    > Categories
    > ----------
    > Since every node will contain some kind of content (text
    > initially, but eventually media objects), it is relatively
    > clear that the fundamental class should provide operations
    > for split, delete, replace, and insert, and that it should
    > allow for a tree-shaped substructure.
    >
    > However, nodes may also acquire other subelements, such as
    > comments. So a node needs the ability to serve as the root
    > of multiple subtrees.
    >
    > In addition, some categories require special behaviors and
    > data structures. For example, if a node is "rated" then it
    > needs the ability to acquire a subtree consisting of
    > evaluations (ratings). It also needs the ability to "average"
    > the evaluations it has received for presentations.
    >
    A method within a composite molecular type is declared statically and should
    contains five forms of operations; search, edit, read, write, and transfer:
    SEARCH - A search operation navigates the forward/backward structure node
    links within a base collection or processor composite structure. Navigation
    may be constrained with upper/lower limits declared as offset and position
    control values. A Scan search action generates a series of structure node
    positions that enable subsequent expression and transformation operations on
    the atomic values. A Find search action generates either a position or a
    range of positions based upon evaluation criteria. A criterion consists of a
    logical, relational, or pattern expression that evaluates atomic values or
    string.
    EDIT - Once a position or range is determined through a search action,
    subsequent edit actions should be able to dynamically move or copy structure
    nodes between collections of the same type. Edit actions should consist of;
    Insert, Append, Replace, Overwrite, Swap, Invert, Reorder, Delete and
    Start-up (returning an existing structure to neutral values). An insert
    action should also be based upon cloning a new structure based upon start-up
    values. These operations manipulate the links between the structure nodes
    and are therefore should be extremely efficient.
    READ/WRITE - A position pointer enables access to specific instance
    structure data to read and write values.
    TRANSFER - A structure should be able to be moved or copied between two
    process composites. These generic behaviors would enable the creation of
    intelligent layers for such things such as DKR rating system.
    >
    > Dynamic Behavior
    > ----------------
    > Following classic O-O principles, it is tempting to construct
    > separate classes for evaluations, comments, and information
    > nodes. But that process moves us further away from identifying
    > an "atomic" structure. More importantly, it runs into the
    > problem that node-categorization (classification) is a dynamic
    > process.
    >
    > If node is not "rated", then it should not be possible to add
    > evaluations to it. If it is "rated", it should be possible to
    > add evaluations, and to have the average of those evaluations
    > summarized as the node's rating. Once a rating has been added,
    > the node can no longer be unrated. But until then, the node
    > could be switched at will back and forth from "unrated" to
    > "rated".
    >
    > That dynamic classification poses problems for a static object
    > oriented class structure. Unless some language system exists
    > that allows behaviors to be added on and taken away in a
    > dynamic classification process, the only alternative is to build
    > all of the behaviors into the fundamental class -- making the
    > identification of an "atomic" data structure all the more
    > important.
    >
    > Extensibility
    > -------------
    > If the behaviors are defined in a single class, then adds to
    > the system by extending that class. To put that class into
    > effect, the system must be designed to create nodes using a
    > "factory method". To create a new information object, then,
    > you don't merely use an existing class to make one. Instead,
    > you ask the factory method to create one and hand it back to
    > you. Runtime parameters can then configure the "factory" to
    > tell it which class to use. So, if the BasicNode class is
    > the standard class for an information node, and if you create
    > an ExtendedNode class, then the factory would be instructed
    > (via a command line switch or configuration file) to use the
    > ExtendedNode class when constructing an information object.
    >
    > Versioning
    > ----------
    > The basic structure for a node, then, is that it contains a
    > pointer to a previous version of itself. For versioning to
    > be useful, however, it must be possible for old links to
    > acquire the newest version. That requires an indirect link
    > -- a "virtual node". At a minimum, then, the data structure
    > must allow for two atomic types: The virtual node that points
    > to the latest version, and the node itself.
    >
    > Some actions like edits, rearranging sublist items, or deleting
    > those items would produce a new version. It should be possible
    > to perform multiple operations of that kind without having a
    > separate (persistent) version number. When "published", the
    > node would have the next sequential version number, regardless
    > of the number of changes. (For "undo", however, multiple
    > non-persistent "revisions" would be kept, so that changes can
    > be backed out. When published, all but the last revision would
    > be removed.)
    >
    The issues of Dynamic Behavior, Extensibility, and Versioning can be recast
    in a different light. The concept of monolithic 'application' programs must
    give way to 'anatomy' programs to achieve the DKR objectives. An anatomy
    would provide pure structure and behavior that drives one or more
    collaborating content systems. A system driven by an anatomy would possess
    four unique properties. A system should remains morphable, immutable,
    updatable, and communicable throughout the lifecycle of a software asset.
    Morphable - A developer should be is free to adapt data structure and
    operational behavior of ancestor templates while relying upon a new language
    medium to maintain descendent template integrity. Such an 'object-composed'
    language would solve the 'fragile class problem' of object-oriented
    languages.
    Immutable - Template design schema should be separate but related to actual
    value structures. This means that adapting anatomies would continue to
    support existing operational system versions.
    Updatable - Existing deployed systems should be automatically updated to the
    current anatomy version without human intervention. This update capability
    would be performed locally or through the Internet to any number of
    distributed systems.
    Communicable - Any two systems should be able to communicate in an ad hoc
    fashion through a messaging mechanism without the development of one-on-one
    APIs.
    These inherent characteristics would redefine the very basis of rapid
    prototyping where an existing executing system never becomes spaghetti code.
    Instead, a set of collaborating components would morph with changing domain
    requirements in a relentlessly consistent manner.
    >
    > Data and Sublists
    > -----------------
    > The node must contain, at a minimum, sublists (or subtrees)
    > for content (text nodes), for comments (text nodes), and
    > evaluations (text nodes with a rating). It may need to keep
    > an author-list (people who are authorized to perform direct
    > edits). It also needs a list of the categories to which it
    > belongs. (Implementing categories as lists provides fast
    > searching, as demonstrated by the Traction system). And it
    > needs a substructure list. (For a heading, for example,
    > the "content" would be the text of the heading while the
    > "substructure" would be subheadings and paragraphs.)
    >
    > A node therefore contains a variety of sublists, and at least
    > some data items. The data items include a rating slot (for
    > rated nodes), a version identifier, and a pointer to the
    > previous version. (Alternatively, one of the sublists could
    > be a version list.) A reference count would also be a good
    > idea, in case nodes wind up with no links at all, so they
    > can be removed. A pointer to the previous revision (during
    > editing) would also be needed, until the node is published.
    >
    > To make the structures extensible, the data items may well
    > be kept in a tuple, where the nature of the tuple depends
    > on the type of the node. (A text node, for example, would
    > have a text string and an author identifier.)
    >
    Root/branch/leaf declaration of linked blocks should be performed statically
    at design time. Each block should have an associated collection that links
    one or more instance structures. Two or more blocks could have the identical
    structure type. A structure contains atomic values. These values may be
    declared as constant length textString, digitString and dataString forms as
    well. Each block is declared with a content-based cardinality; always one,
    zero or more, one or more, specified or zero or more. Cardinality is
    automatically maintained as structures are dynamically cloned, copied or
    moved between block collections. Each structure node can have a branch that
    connects to one or more nested block collections. More than one blocks may
    be declared with the same structure type on different branches. A block can
    also be declared as a variable length String.
    This general systems cognitive model would be able to handle the
    requirements outlined by Eric and much more ...
    >
    Sandy Klausner, CTO
    Cubicon Technologies Corporation
    klausner@cubicon.com
    (408) 867-1100



    This archive was generated by hypermail 2b29 : Mon Apr 24 2000 - 11:36:12 PDT