[unrev-II] Refactoring and information annealing

From: Garold L. Johnson (dynalt@dynalt.com)
Date: Fri Dec 22 2000 - 10:38:01 PST

  • Next message: Jack Park: "Re: [unrev-II] Is "bootstrapping" part of the problem?"

    In an earlier email I pointed out:

    Studies show that people resist formalisms in capturing information even if
    they use such conventions by observations.
    http://www.csdl.tamu.edu/~shipman/,
    http://www.csdl.tamu.edu/~shipman/formality-paper/harmful.html .
    http://www.csdl.tamu.edu/~shipman/aiedam/aiedam.html

    Given this, we need ways to capture information as it arises, and connect it
    in
    various ways later.

    Refactoring is the term used in Object Oriented design to refer to the
    necessity for reorganizing class and other code structures as a project
    progresses and better abstractions are discovered.

    Neil Larson uses the term “information annealing” to refer to a similar
    process in group hypertext systems, and believes that at least some of it
    must be done by a knowledge expert in order to be effective. I like the
    information annealing concept. Annealing in materials refers to the process
    of heating a material until it is possible for molecules in the material to
    move slightly without distorting the material, and then allowing the
    material to cool slowly. This process removes local strains from the system
    as the molecules adjust their positions.

    The short form of the arguments presented in the references is that we
    resist formalisms that categorize things before we are comfortable with the
    categories that are available:
    * We leave files out where we can get at them until we have some
    place to put them or things get too messy. This is true of paper, disk
    directories, and email systems. It appears to be true in all information
    systems maintained by people.
    * When we evolve categories, there is concern about getting it
    wrong, so we delay.

    I found it fascinating that people hesitate to follow formalisms such as
    IBIS even when study of their actual behavior indicates that the formalism
    is just what they do. I think this is related to the similar problem of word
    processing where we have to look at issues of structure and appearance
    before we are really certain of the content.

    Extreme Programming uses continual refactoring as a major tool. If changes
    to the code in an area indicate the abstraction could be better and cleaner,
    ii is reorganized at once.

    I used MaxThink, an outline based information tool for DOS, for years and
    use an outliner today. With MaxThink I found the following to be extremely
    useful:
    * Write down a series of statements just as they came to mind. Use
    just enough explanation to remind me later of what I had in mind.
    * When I ran out of things to put on the list, move things around to
    get some similar items together.
    * When I could identify a few topics, add them at the top of the
    list and then do a “bin sort” which allowed me to move all the topics that
    were children of the new topics under the new topics.
    * Iterate on this until relatively little motion occurs.
    * Pick a topic and repeat.
    * If a couple of topics look like they aren’t working well, flatten
    their children to the next level up and continue the process.

    Using this approach I could generate a hierarchical document that was very
    coherent, and do it very quickly. This applies formalisms largely after the
    fact. Once there was a framework, some things had logical places to go, and
    I applied the formalisms to some new content as the content was generated.

    Once I moved the information into a word processor to get the nice
    formatting, I was dead. There was no good way to manipulate the content
    again in the word processor. If the organization go too messy I would export
    the word processing file back into MaxThink and work on the organization.
    This of course lost all the formatting, so that had to be done all over when
    I brought it back into the word processor.

    This of course still didn’t support linking at the time. Cloning of topics
    and other advanced operation were available but somewhat expensive.

    I conclude that it is necessary to be able to refactor information at any
    stage. It would be nice if the base material were still available also as a
    check on the refactoring, but the refactoring is essential. Human knowledge,
    particularly in the hard sciences is subject to this sort of refactoring in
    a largely uncontrolled way. We no longer use the alchemists theory of the
    universe as part of our current knowledge base – it has been refactored out.
    We do, however, have some degree of access to the historical documents that
    reflect that refactoring process.

    Hierarchical organization is generally superior to linear presentation. The
    topic context provides all sorts of support for things that don’t have to be
    said. The extensive quoting that we do in emails is indicative of the lack
    of context that we have in an essentially linear system (threads are linear
    internally).

    On of the problems with hierarchy is that is implicitly assumes that any
    piece of information has exactly one place to go. This is essentially saying
    that the universe of discourse can be represented as disjoint sets so that
    each element belongs in exactly one parent set. It doesn’t take long working
    with this assumption before its invalidity is apparent.

    The first response to this in a hierarchical system is the ability to clone
    a topic so that it can exist in multiple places in the hierarchy and still
    be a single topic – edit it anywhere and it changes everywhere.

    This now assumes that a single hierarchy will handle the information
    organization. That notion is soon dispelled. It becomes clear with just a
    little observation and introspection that humans organize information using
    multiple hierarchies, at least. The hierarchy used depends on the context of
    the information and the attributes that are considered relevant in that
    context. This leads to the observation that hierarchy needs to be separate
    from the content – it should be possible to supply as many hierarchies,
    indices, or other similar organizing schemes as desired to the content
    without forcing assumption on the content. This is recursive in that
    hierarchies become content which can in turn be referenced and reorganized.

    This leads us to bi-directional, typed, links which may have added attribute
    information about the link. A truly general linking mechanism (fine grained
    links) with the ability to copy (make a new instance for revision) and
    reference (clone if it appears to be apart of the content rather than an
    external reference) provides a logical representation in which it is
    possible to simulate any other relationship mechanisms that we might choose.
    Note that it is often of value to hide some of these mechanisms in larger
    concepts – an outliner, an IBIS-like system or a semantic net. The
    representation provide for ease of use of commonly applied representational
    formalisms. We may introduce link types to support some of these formalisms
    more directly in the underlying representation.

    The power of KM is the observation that the problem of representing
    knowledge has at least some elements which are independent of the knowledge
    being represented.

    Consider the evolution of software development approaches:

    * Spaghetti code – we can link anything to anything any time, access
    any information we want and go anywhere we like. We quickly discovered that
    humans cannot function effectively at that level of generality as the size
    of the information increases.
    * Structured code – every construct has a single entrance and a
    single exit. It is possible t construct any program with sequence,
    alternation, and repetition. Powerful. Now we have some degree of
    discipline. Code is easier to read, easier to write, and (somewhat) easier
    to get right.
    * Structured analysis – any problem can be decomposed into a
    hierarchy of functions. This helped too, but ran into problems.
    * Data Structured Development – problems are properly characterized
    by the nature and structure of the data that represents them. For problem on
    which this worked, it worked spectacularly.
    * Object Oriented development – both data and behavior must be
    considered when representing objects. Objects belong to classes, which
    describe the (hierarchical) set of data and behaviors that the object
    possesses.
    o There is disagreement as to whether there can be multiple
    hierarchies or only one.
    o Class membership is generally fixed. There are some systems and
    some techniques to allow an object to change its class dynamically.
    o The set of behaviors and data structure that represent an object is
    fixed.
    o Some say that the data and behavior is inadequate, that we have to
    understand and enforce the logical contract that represents the invariant
    promise of a functions behavior.
    * Frame based systems – every object is unique although we can
    organize them into class hierarchies for description. Such systems allow the
    dynamic addition of data and/or behaviors to objects and sometimes classes.
    We are now at a level of representation that has been proposed for general
    knowledge representation.

    The entire evolution of software development methodologies has been one of
    improved knowledge representation as we work out ways of translating these
    higher level representations into executable formalisms. Note that the
    underlying executable system is not constrained, in principle, by the nature
    of the formalisms that it implements. At the level of machine instructions
    we generally have the ability to use spaghetti code at will – there are
    branches that are essentially arbitrary links, and it is potentially
    possible to reference any point in the entire memory space of the problem.
    The translators add the constraints, but many of them do not exist at the
    hardware level. As we progress and it becomes clearer that certain
    approaches work in all cases, machine instruction sets are being refactored
    to enforce some of the higher level constructs – access is restricted based
    on program data, there are special loop constructs, etc.

    I submit that we can use the knowledge of the path we followed in the
    development of software development techniques to do a better job of
    evolving KM. Note also that software development itself could benefit
    greatly from even a relatively rudimentary KM solution that didn’t get too
    much in the way. Note that most software tools manage only a part of the
    information and very little of the knowledge – most of them are simply
    editors for some representation of the code, and really exhibit no
    understanding of what it is they are manipulating. They are word processors
    for diagrams, with no understanding of the structure of the knowledge
    represented by the diagram.

    By refactoring the path we have already followed, we should be able to get a
    set of requirements for an eventual KM system. Not being constrained by even
    reasonable amounts of caution, let’s try it.

    * Every space time event is unique. It also has an effectively
    infinite number of things that can be said about it – a potentially infinite
    attribute space.
    o When we represent an object, we have only a collection of
    representations of a selected set of attributes -- we don’t have the object.
    o We know object only in terms of their attributes. The set of values
    for the set of attributes is the state vector for the object
    o We recognize changes in an object in terms of changes in the values
    of its attributes.
    o Behaviors are functions that can change the attributes of an
    object.
    o We restrict the set of behaviors that can be applied to an object
    to some set that has meaning (produces what we consider meaningful results
    for that object.
    * We attempt to understand and organize this chaos by identifying
    commonalities amongst attributes. This results in layered abstractions in
    the first approximation.
    * As part of this process, we focus at any time of a specific set of
    attributes of interest that participate in the organizing effort. The
    attributes of interest for an object may be different depending on how the
    objects are being organized.
    * This implies (among other things) that:
    o It must be possible to identify an object uniquely. This is the
    equivalent of an object ID in an OO system or a URL on the web.
    * We will consider that an object reference represents and
    ultimately translates to the unique object referenced.
    * It must be possible to reference the state of an object as of a
    specific point in time – we must be able to refer to versions of an object.
    * It must be possible to construct an new object by copying the
    content of a specific object with another instance identifier. We may want
    to track the fact that this object started as a copy of a specific object.
    o It must be possible to modify the attributes of an object. Things
    change as a result of actions taken – paint it blue and the color changes.
    o It must be possible to add attributes to an object. We aren’t going
    to get all of them at once (there are an infinite number) so we need to be
    able to add attributes as we become interested in them. Note that this
    implies the ability to add attributes to the equivalent of classes and to
    create other classes that may specialize on the new attributes.
    o Having already found this out, links should be bi-directional so
    that we can link to sources, etc.
    o All of this combines into a set of requirements that, if realized,
    results in the lowest level of chaos referred to as spaghetti code in
    software. We are close to being able to represent arbitrary relationships
    and essentially arbitrary and extendible operations on information nodes.
    Now we consider adding structure to this mess.
    * We add the following structuring concepts:
    o Reference – point to another object
    * In place – the object is displayed inline with an external
    reference. It cannot be edited. Used for quoting.
    * External – provides a link to the referenced object.
    o Copy – a node is copied in place. The contents can be edited. It
    retains an external link.
    o Hierarchy – specific references organize a set of nodes and node
    references into a hierarchy. Not that this allows the hierarchy to contain
    or reference the external information at the choice of the author, and for
    the nodes in the hierarchy or the entire hierarchy to be treated as a node
    in its own right. This also supports the use of multiple hierarchies on the
    same underlying information.
    o Versioning of nodes. This includes versioning and archiving of
    documents, which establish points in time where a give node can no longer be
    edited.

    At this point we can begin to add any sort of operations that we can dream
    up and experiment with. Want an annotation system? Add a button to generate
    a comment, typed or not, request whatever information is needed for link
    type information, and have the associated action create the information node
    and the links. If the link information is small enough, use multiple
    buttons. Add a section to the node data to control the use of this node in
    this abstraction.

    The basic reference engine is a real bear, but the initial requirements seem
    to be a relatively small set, and I think achievable. Certainly the designs
    of existing systems can be run against these to validate or modify the
    requirements.

    Add the scalability requirements from several emails back (individual to
    group to merged knowledge) and I would be delight with a system that
    implemented the requirements in a reasonable fashion.

    Thanks,

    Gary

    -------------------------- eGroups Sponsor -------------------------~-~>
    eGroups eLerts
    It's Easy. It's Fun. Best of All, it's Free!
    http://click.egroups.com/1/9698/0/_/444287/_/977510647/
    ---------------------------------------------------------------------_->

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:
      http://www.onelist.com/community/unrev-II



    This archive was generated by hypermail 2b29 : Fri Dec 22 2000 - 10:55:03 PST