Re: [unrev-II] Towards an Atomic Data Structure

From: Sandy Klausner (klausner@cubicon.com)
Date: Mon Apr 24 2000 - 11:22:21 PDT

Next message: Eric Armstrong: "[unrev-II] MySQL License"

Previous message: Jack Park: "Re: [unrev-II] Upcoming Agenda Items"
Maybe in reply to: Rod Welch: "[unrev-II] Towards an Atomic Data Structure"
Next in thread: Jeff Miller: "Re: [unrev-II] Towards an Atomic Data Structure"
Next in thread: Eric Armstrong: "Re: [unrev-II] Towards an Atomic Data Structure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>Eric Armstrong wrote:
> Background: Object-Oriented (O-O) Terminology
> ---------------------------------------------
> A "class" is a template you use to construct an "object".
> The class defines the behaviors that members of the class
> are capable of, and defines the data items that distinguish
> one member of the class from another. Individual objects
> use the behaviors defined by the class, and have specific
> data values that make them (usually unique) "individuals".
> The Car class, for example, might define behaviors for
> start and stop. It also might include data types for
> model, color, and acceleration characteristics. So a Red
> Ferrari would have one set of data items, while a Green
> Edsel would have another set. But both would have
> start and stop behaviors.
>
Traditional object-oriented languages do not make a clear distinction
between objects and types. Both 'template' forms include the notion of
encapsulated data structure and method behavior, but they deal with two
distinct levels of abstraction. An object structure may consist of molecular
data represented by types. Molecular data is different from atomic data
which is limited to logic, number, and symbolic information. Atomic data is
accessed directly from method operations, whereas molecular data can only be
accessed through the public interface of the type. A molecular type may
contain a flat data structure, i.e. time/date. A molecular type may
alternatively contain a composite data structure consisting of linked nested
nodes. Each node can consist of a collection of instances of either typed
data structures or untyped text. These complex abstractions could represent
everything from markup text to rich multimedia representations.
>
> Text Nodes
> ----------
> The fundamental unit of a DKR is an item of information.
> Since the ideal in writing is to have "one idea per
> paragraph", an "information node" can be thought of as
> a paragraph of text. Headings stand apart from other
> text, as well, so a heading is a special (short)
> paragraph, or information node.
>
> Node behaviors are defined in a class (object template).
> Every text node must contain an attribution -- a pointer
> to the author, or an identifying string. A copy of that
> node may be edited, which suggests the need for a split
> operation, for example. After node is split into one or
> more fragments, and edit operation could replace some
> fragments or insert new ones that have a different author.
> Some of the operations appropriate to a node might therefore
> include split, delete, replace, and insert.
>
> Note that when the node is split, two objects exist where
> one did before. Every node must therefore be capable of
> being the root of a subtree. Although it may start out life
> as a simple node that contains or points to an item of text,
> it must also be capable of pointing to a list of text
> elements. (That list might also include markup elements,
> like HTML bold tags: <b>.) Since each item in that list may
> itself point to a list of subitems, the resulting structure
> is a tree.
>
The SGML community has a long history of developing ways to markup
documents to capture semantic knowledge embedded in strings. As processing
requirements become more sophisticated, new ways of managing this complexity
need to be developed. One possible solution is to move to a clear document
model. This molecular data type model separates concerns by parsing the
clear text from the markup information. The clear text is parsed into a base
collection of linked character nodes, while one or more composite structure
processors maintain position and range links into the base collection. Each
processor may have specialized behavior to analyze and hold semantic
information on format, organization, navigation, narrative, reference,
graphic control, publication, and filters. The model must be able to allow
clear text editing while automatically maintaining the processor links into
the base collection.
>
> Categories
> ----------
> Since every node will contain some kind of content (text
> initially, but eventually media objects), it is relatively
> clear that the fundamental class should provide operations
> for split, delete, replace, and insert, and that it should
> allow for a tree-shaped substructure.
>
> However, nodes may also acquire other subelements, such as
> comments. So a node needs the ability to serve as the root
> of multiple subtrees.
>
> In addition, some categories require special behaviors and
> data structures. For example, if a node is "rated" then it
> needs the ability to acquire a subtree consisting of
> evaluations (ratings). It also needs the ability to "average"
> the evaluations it has received for presentations.
>
A method within a composite molecular type is declared statically and should
contains five forms of operations; search, edit, read, write, and transfer:
SEARCH - A search operation navigates the forward/backward structure node
links within a base collection or processor composite structure. Navigation
may be constrained with upper/lower limits declared as offset and position
control values. A Scan search action generates a series of structure node
positions that enable subsequent expression and transformation operations on
the atomic values. A Find search action generates either a position or a
range of positions based upon evaluation criteria. A criterion consists of a
logical, relational, or pattern expression that evaluates atomic values or
string.
EDIT - Once a position or range is determined through a search action,
subsequent edit actions should be able to dynamically move or copy structure
nodes between collections of the same type. Edit actions should consist of;
Insert, Append, Replace, Overwrite, Swap, Invert, Reorder, Delete and
Start-up (returning an existing structure to neutral values). An insert
action should also be based upon cloning a new structure based upon start-up
values. These operations manipulate the links between the structure nodes
and are therefore should be extremely efficient.
READ/WRITE - A position pointer enables access to specific instance
structure data to read and write values.
TRANSFER - A structure should be able to be moved or copied between two
process composites. These generic behaviors would enable the creation of
intelligent layers for such things such as DKR rating system.
>
> Dynamic Behavior
> ----------------
> Following classic O-O principles, it is tempting to construct
> separate classes for evaluations, comments, and information
> nodes. But that process moves us further away from identifying
> an "atomic" structure. More importantly, it runs into the
> problem that node-categorization (classification) is a dynamic
> process.
>
> If node is not "rated", then it should not be possible to add
> evaluations to it. If it is "rated", it should be possible to
> add evaluations, and to have the average of those evaluations
> summarized as the node's rating. Once a rating has been added,
> the node can no longer be unrated. But until then, the node
> could be switched at will back and forth from "unrated" to
> "rated".
>
> That dynamic classification poses problems for a static object
> oriented class structure. Unless some language system exists
> that allows behaviors to be added on and taken away in a
> dynamic classification process, the only alternative is to build
> all of the behaviors into the fundamental class -- making the
> identification of an "atomic" data structure all the more
> important.
>
> Extensibility
> -------------
> If the behaviors are defined in a single class, then adds to
> the system by extending that class. To put that class into
> effect, the system must be designed to create nodes using a
> "factory method". To create a new information object, then,
> you don't merely use an existing class to make one. Instead,
> you ask the factory method to create one and hand it back to
> you. Runtime parameters can then configure the "factory" to
> tell it which class to use. So, if the BasicNode class is
> the standard class for an information node, and if you create
> an ExtendedNode class, then the factory would be instructed
> (via a command line switch or configuration file) to use the
> ExtendedNode class when constructing an information object.
>
> Versioning
> ----------
> The basic structure for a node, then, is that it contains a
> pointer to a previous version of itself. For versioning to
> be useful, however, it must be possible for old links to
> acquire the newest version. That requires an indirect link
> -- a "virtual node". At a minimum, then, the data structure
> must allow for two atomic types: The virtual node that points
> to the latest version, and the node itself.
>
> Some actions like edits, rearranging sublist items, or deleting
> those items would produce a new version. It should be possible
> to perform multiple operations of that kind without having a
> separate (persistent) version number. When "published", the
> node would have the next sequential version number, regardless
> of the number of changes. (For "undo", however, multiple
> non-persistent "revisions" would be kept, so that changes can
> be backed out. When published, all but the last revision would
> be removed.)
>
The issues of Dynamic Behavior, Extensibility, and Versioning can be recast
in a different light. The concept of monolithic 'application' programs must
give way to 'anatomy' programs to achieve the DKR objectives. An anatomy
would provide pure structure and behavior that drives one or more
collaborating content systems. A system driven by an anatomy would possess
four unique properties. A system should remains morphable, immutable,
updatable, and communicable throughout the lifecycle of a software asset.
Morphable - A developer should be is free to adapt data structure and
operational behavior of ancestor templates while relying upon a new language
medium to maintain descendent template integrity. Such an 'object-composed'
language would solve the 'fragile class problem' of object-oriented
languages.
Immutable - Template design schema should be separate but related to actual
value structures. This means that adapting anatomies would continue to
support existing operational system versions.
Updatable - Existing deployed systems should be automatically updated to the
current anatomy version without human intervention. This update capability
would be performed locally or through the Internet to any number of
distributed systems.
Communicable - Any two systems should be able to communicate in an ad hoc
fashion through a messaging mechanism without the development of one-on-one
APIs.
These inherent characteristics would redefine the very basis of rapid
prototyping where an existing executing system never becomes spaghetti code.
Instead, a set of collaborating components would morph with changing domain
requirements in a relentlessly consistent manner.
>
> Data and Sublists
> -----------------
> The node must contain, at a minimum, sublists (or subtrees)
> for content (text nodes), for comments (text nodes), and
> evaluations (text nodes with a rating). It may need to keep
> an author-list (people who are authorized to perform direct
> edits). It also needs a list of the categories to which it
> belongs. (Implementing categories as lists provides fast
> searching, as demonstrated by the Traction system). And it
> needs a substructure list. (For a heading, for example,
> the "content" would be the text of the heading while the
> "substructure" would be subheadings and paragraphs.)
>
> A node therefore contains a variety of sublists, and at least
> some data items. The data items include a rating slot (for
> rated nodes), a version identifier, and a pointer to the
> previous version. (Alternatively, one of the sublists could
> be a version list.) A reference count would also be a good
> idea, in case nodes wind up with no links at all, so they
> can be removed. A pointer to the previous revision (during
> editing) would also be needed, until the node is published.
>
> To make the structures extensible, the data items may well
> be kept in a tuple, where the nature of the tuple depends
> on the type of the node. (A text node, for example, would
> have a text string and an author identifier.)
>
Root/branch/leaf declaration of linked blocks should be performed statically
at design time. Each block should have an associated collection that links
one or more instance structures. Two or more blocks could have the identical
structure type. A structure contains atomic values. These values may be
declared as constant length textString, digitString and dataString forms as
well. Each block is declared with a content-based cardinality; always one,
zero or more, one or more, specified or zero or more. Cardinality is
automatically maintained as structures are dynamically cloned, copied or
moved between block collections. Each structure node can have a branch that
connects to one or more nested block collections. More than one blocks may
be declared with the same structure type on different branches. A block can
also be declared as a variable length String.
This general systems cognitive model would be able to handle the
requirements outlined by Eric and much more ...
>
Sandy Klausner, CTO
Cubicon Technologies Corporation
klausner@cubicon.com
(408) 867-1100

Next message: Eric Armstrong: "[unrev-II] MySQL License"
Previous message: Jack Park: "Re: [unrev-II] Upcoming Agenda Items"
Maybe in reply to: Rod Welch: "[unrev-II] Towards an Atomic Data Structure"
Next in thread: Jeff Miller: "Re: [unrev-II] Towards an Atomic Data Structure"
Next in thread: Eric Armstrong: "Re: [unrev-II] Towards an Atomic Data Structure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Mon Apr 24 2000 - 11:36:12 PDT