[unrev-II] Knowledge Repositories and Design Discussions

From: Eric Armstrong (eric.armstrong@eng.sun.com)
Date: Fri Mar 02 2001 - 21:03:13 PST

  • Next message: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"

    Sorry... This one got away from me. It's long.
    Look for it to be instantiated in HTML at a
    venue near you....

    Keywords (partial ordering):
      [abstraction sequences, complex abstractions,
       simple abstractions],
      [CKC, content, context, design discussions,
       granularity, IBIS, knowledge, knowledge nuggets,
       knowledge products, nuggets],
      [concrete ratings, ratings, situational ratings]

    Recent events have got me closer to thinking about
    knowledge repositories, and how they might actually
    work, and how they might interact with a mechanism
    for carrying on an online design discussion.

    To keep things managable, let's focus on a concrete
    issue in system development: Implementing a linked
    list. (The same kinds of thinking undoubtedly applies
    in other areas, but focusing on software development
    has the greatest "bootstrap" effect.)

      Lee Iverson has made a point of distinguishing
      the "content" (documents), "knowledge", and "context"
      aspects of a respository-based system. This paper
      represents insights gained from investigating what
      really resides in each of those bubbles (or, possibly,
      layers) -- especially in the "knowledge" arena.

    In the knowledge layer, the subject "Linked List" has
    several useful subcomponents:

      Linked List
        --what is a... (verbal description)
        --how does one look... (diagrams, animations)
        --how does one use... (examples, explanations)
        *-how good is... (situational ratings)
        *-how does one implement... (recipe or template)

      * Of course, there are actually multiple different
        kinds of linked lists: singly-linked, doubly-linked,
        straight lists, circular lists, null-pointer
        terminated, and null-value or special-end-node
        terminated. But for now, let's keep things simple
        and assume that "linked list" means a singly-linked
      * In a recent conversation, Art Freidman reminded me
        that there many possible choices for the API to such
        a list, as well. I'm just going to pretend I didn't
        hear that...

    The first three are the kind of things you would expect
    in a good tutorial. They represent the first thing you
    would go to if you got the answer, "What you need is a
    linked list" in response to a question you asked on some
    interactive forum. In fact, they might well simply
    point to a section in one of Donald Knuth's books, for
    the clearest possible illustrations and explanations.

    The last two subcompoents, situational ratings and
    templates, have some interesting characteristics and
    implications that are worth exploring.
    Timeout: What is Knowledge?
    "Knowledge" in such a system can take several forms.
    The following list probably is not comprehensive,
    but it's enough to get started:

      * Categorization
        For example, a case study of building the Aswan
        Dam can be categorized under egypt, earth-moving
        equipment, and construction, among other things.
        Each category that is applied to the study
        allows greater intelligence to be applied in
        future searches of the repository.

      * Simple Abstractions
        For example, "Assigning a value to a variable"
        is a simple abstraction that has a single mapping
        in most procedural languages.

      * Complex Abstractions and Abstraction-Sequences
        For a language like Lisp, APL, or Forth, the
        mechanism for assiging a value to a variable might
        be rather complex, and it start with an admonition
        not to do that if you can help it!
        A recipe or template for a linked list, on the
        other hand, would consist of an ordered sequence
        of abstractions.

    Returning now to your regularly shceduled programming,
    we'll get back to the topics of situational ratings
    and take a deeper look at templates...

    Situational Ratings
    At the knowledge layer, ratings are necessarily provisional.
    Thus, a linked list is good if you can afford the extra
    space, don't mind a little extra overhead when you're looping,
    and you need to do a lot of inserts and deletes. On the
    other hand, for a fixed list, an array is typically going
    to carry less overhead. (But that evaluation, too, is
    situational -- in Lisp, a list is the *only* way to go.)

    But if at the knowlege layer ratings are provisional, for
    any specific project they are fairly concrete. One might
    argue, based on the characteristics of the project one is
    working on, that a linked list is appropriate. To give that
    argument some weight, one would reference the "knowledge
    nugget" that was stored on the subject of Linked Lists.

    In other words, the design discussion (presumably an
    IBIS-style discussion carried on within the scope of the
    repository) defines the *context* within which the
    knowledge is used. (The knowledge, meanwhile, rests on
    a foundation of content. More on that in a bit.)

    So, in a design discussion, it would be possible to
    say, "I think we should use a linked list" and cite
    as rationale the user specs that say items will
    constantly be added and deleted, along with the
    "knowledge nugget" that gives Linked Lists a high
    rating for such purposes. The recipient of your
    wisdom (who may never have heard the term), can
    then get a tutorial on the subject from the knowledge

    Recipes & Templates
    Now consider a "recipe" as a collection of steps, or
    an ordered set. The recipe, or template, for a linked
    list algorithm will (to qualify as knowledge) be
    very abstract. If an actual implementation is available,
    for example, in legacy assembler code, then the
    recipe might well contain a link to that implementation.
    But the recipe itself would like something like the
    implementation comments extracted from the source code.

      Initial stabs at the recipe are liable to be very
      language specific. So a step might read "use the
      address stored in the Next variable to access the
      next item". However, better and more useful abstractions
      will be more absract and less specific, e.g. "visit
      the next item". That kind of generality is hard to
      get right one the first try. So the system should make
      it possible to refine the generalizations as more
      implementations are "covered" by the template.

    Now, a full implementation of a concept like Linked List
    is great, if one exists. The question, "How do I implement
    a linked list in language X?" can be answered with a
    pointer to the implementation. But what if you are working
    on a project in a new language? (As we'll see, idioms
    hold the key.)

    Idioms and Granularity
    It now makes sense to introduce the concept of an "idiom".
    An *idiom* is the syntactic mechanism for achieving a
    task in a specific language. For example, the "loop idiom"
    breaks down into for-loops, while-loops, and until-loops,
    each with situational ratings. For each loop, the specific
    syntax used in the C language constitutes the idiom for
    that concept in that language.

      A primitive idiom is a simple abstraction that has a
      one-to-one mapping with the language. For example, the
      idiom for assigning a value to a variable is a
      single-step process in most any procedural language.
      A complex idiom may have multiple steps, like a recipe
      -- but the steps are language specific. (The abstract
      template for those steps may well exist as a knowledge
      nugget, but the language-specific steps constitute an

    Now, given a template T, consisting of an ordered set
    {s1, s2, ...} of steps and I, a collection of idioms
    that can be expressed in the langauge {i1, i2, ...}
    the expression
      T x I == {s1, s2, ...} x {i1, i2, ...}

    produces a *knowledge product* -- literally, the product
    of two different kinds of knowledge stored in the
    repository. In this case, the knowledge product
      KP = T x I

    can define the implementation for a linked list in a
    brand new language -- as long as the template and the
    specific idioms exist in the repository.

    It is here that the need for "highly granular" documents
    becomes apparent. There are dozens of papers and books
    that tell how you do things in a given language. But
    until the idioms and simple abstractions contained in
    those documents can be pinpointed -- i.e. individually
    referenced -- there is no hope of automatically
    generating a knowledge product like the example above.

    To return again to the Content-Knowledge-Context (CKC)
    picture, nuggets in the knowledge layer must (or should)
    have fine-grained links to items in the content layer.

    Related Knowledge is Important
    Recall that the template for a linked list is only
    one nugget of knowledge stored for that concept.
    Related kinds of information is frequently important
    to producing an answer.

    For example, should I ask "How do I implement a linked
    list in Java?", the Linked List topic *header* should
    link directly to the Java idiom:
       new java.util.LinkedList().
    In this case, no template-instantiation is needed!

    For a language like C, on the other hand, in which
    many implementations exist with different APIs and
    performance characteristics, multiple responses could
    be returned, with situational ratings for each.

    As another example, should you ask "How do I implement a
    linked list in Cobol?", a human developer might very
    well respond with the questions, "Are you sure you
    want to do that?", "What is it you're trying to do?",
    "Is Cobol the right language for the job?" "Do you
    absolutely have to use Cobol and, if so, could you
    consider using another structure that would be more
    suitable for that language?"

    These are questions that a knowledge repository will
    probably not be smart enough to ask any time soon.
    If it *were* able to do that, so much the better!

    On the other hand, the knowledge repository *should*
    make it possible to deduce the implementation in
    Cobol, and put it forward in an online design
    discussion. Other members of the discussion, should
    then have access to reference material in the
    repository to support arguments for why the performance
    would suck, why another data structure should be
    used, why a different language should be used, etc.

    In this sense, the knowledge base is a partner to
    and support for the discussion, which is carried on
    the higher level, in the context of the discussion.

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:

    Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

    This archive was generated by hypermail 2b29 : Fri Mar 02 2001 - 21:14:48 PST