[unrev-II] Knowledge Repositories and Design Discussions

From: Eric Armstrong (eric.armstrong@eng.sun.com)
Date: Fri Mar 02 2001 - 21:03:13 PST

Next message: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"

Previous message: Eric Armstrong: "Re: [unrev-II] Peer to Peer and Email..."
Next in thread: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"
Reply: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Sorry... This one got away from me. It's long.
Look for it to be instantiated in HTML at a
venue near you....
-------------------------------

Keywords (partial ordering):
  [abstraction sequences, complex abstractions,
   simple abstractions],
  [CKC, content, context, design discussions,
   granularity, IBIS, knowledge, knowledge nuggets,
   knowledge products, nuggets],
  [concrete ratings, ratings, situational ratings]

Recent events have got me closer to thinking about
knowledge repositories, and how they might actually
work, and how they might interact with a mechanism
for carrying on an online design discussion.

To keep things managable, let's focus on a concrete
issue in system development: Implementing a linked
list. (The same kinds of thinking undoubtedly applies
in other areas, but focusing on software development
has the greatest "bootstrap" effect.)

  Note:
  Lee Iverson has made a point of distinguishing
  the "content" (documents), "knowledge", and "context"
  aspects of a respository-based system. This paper
  represents insights gained from investigating what
  really resides in each of those bubbles (or, possibly,
  layers) -- especially in the "knowledge" arena.

In the knowledge layer, the subject "Linked List" has
several useful subcomponents:

  Linked List
    --what is a... (verbal description)
    --how does one look... (diagrams, animations)
    --how does one use... (examples, explanations)
    *-how good is... (situational ratings)
    *-how does one implement... (recipe or template)

  Notes:
  * Of course, there are actually multiple different
    kinds of linked lists: singly-linked, doubly-linked,
    straight lists, circular lists, null-pointer
    terminated, and null-value or special-end-node
    terminated. But for now, let's keep things simple
    and assume that "linked list" means a singly-linked
    list...

  * In a recent conversation, Art Freidman reminded me
    that there many possible choices for the API to such
    a list, as well. I'm just going to pretend I didn't
    hear that...

The first three are the kind of things you would expect
in a good tutorial. They represent the first thing you
would go to if you got the answer, "What you need is a
linked list" in response to a question you asked on some
interactive forum. In fact, they might well simply
point to a section in one of Donald Knuth's books, for
the clearest possible illustrations and explanations.

The last two subcompoents, situational ratings and
templates, have some interesting characteristics and
implications that are worth exploring.

Timeout: What is Knowledge?
--------------------------
"Knowledge" in such a system can take several forms.
The following list probably is not comprehensive,
but it's enough to get started:

  * Categorization
    For example, a case study of building the Aswan
    Dam can be categorized under egypt, earth-moving
    equipment, and construction, among other things.
    Each category that is applied to the study
    allows greater intelligence to be applied in
    future searches of the repository.

  * Simple Abstractions
    For example, "Assigning a value to a variable"
    is a simple abstraction that has a single mapping
    in most procedural languages.

  * Complex Abstractions and Abstraction-Sequences
    For a language like Lisp, APL, or Forth, the
    mechanism for assiging a value to a variable might
    be rather complex, and it start with an admonition
    not to do that if you can help it!

    A recipe or template for a linked list, on the
    other hand, would consist of an ordered sequence
    of abstractions.

Returning now to your regularly shceduled programming,
we'll get back to the topics of situational ratings
and take a deeper look at templates...

Situational Ratings
-------------------
At the knowledge layer, ratings are necessarily provisional.
Thus, a linked list is good if you can afford the extra
space, don't mind a little extra overhead when you're looping,
and you need to do a lot of inserts and deletes. On the
other hand, for a fixed list, an array is typically going
to carry less overhead. (But that evaluation, too, is
situational -- in Lisp, a list is the *only* way to go.)

But if at the knowlege layer ratings are provisional, for
any specific project they are fairly concrete. One might
argue, based on the characteristics of the project one is
working on, that a linked list is appropriate. To give that
argument some weight, one would reference the "knowledge
nugget" that was stored on the subject of Linked Lists.

In other words, the design discussion (presumably an
IBIS-style discussion carried on within the scope of the
repository) defines the *context* within which the
knowledge is used. (The knowledge, meanwhile, rests on
a foundation of content. More on that in a bit.)

So, in a design discussion, it would be possible to
say, "I think we should use a linked list" and cite
as rationale the user specs that say items will
constantly be added and deleted, along with the
"knowledge nugget" that gives Linked Lists a high
rating for such purposes. The recipient of your
wisdom (who may never have heard the term), can
then get a tutorial on the subject from the knowledge
base.)

Recipes & Templates
-------------------
Now consider a "recipe" as a collection of steps, or
an ordered set. The recipe, or template, for a linked
list algorithm will (to qualify as knowledge) be
very abstract. If an actual implementation is available,
for example, in legacy assembler code, then the
recipe might well contain a link to that implementation.
But the recipe itself would like something like the
implementation comments extracted from the source code.

  Note:
  Initial stabs at the recipe are liable to be very
  language specific. So a step might read "use the
  address stored in the Next variable to access the
  next item". However, better and more useful abstractions
  will be more absract and less specific, e.g. "visit
  the next item". That kind of generality is hard to
  get right one the first try. So the system should make
  it possible to refine the generalizations as more
  implementations are "covered" by the template.

Now, a full implementation of a concept like Linked List
is great, if one exists. The question, "How do I implement
a linked list in language X?" can be answered with a
pointer to the implementation. But what if you are working
on a project in a new language? (As we'll see, idioms
hold the key.)

Idioms and Granularity
----------------------
It now makes sense to introduce the concept of an "idiom".
An *idiom* is the syntactic mechanism for achieving a
task in a specific language. For example, the "loop idiom"
breaks down into for-loops, while-loops, and until-loops,
each with situational ratings. For each loop, the specific
syntax used in the C language constitutes the idiom for
that concept in that language.

  Note:
  A primitive idiom is a simple abstraction that has a
  one-to-one mapping with the language. For example, the
  idiom for assigning a value to a variable is a
  single-step process in most any procedural language.
  A complex idiom may have multiple steps, like a recipe
  -- but the steps are language specific. (The abstract
  template for those steps may well exist as a knowledge
  nugget, but the language-specific steps constitute an
  idiom.)

Now, given a template T, consisting of an ordered set
{s1, s2, ...} of steps and I, a collection of idioms
that can be expressed in the langauge {i1, i2, ...}
the expression
T x I == {s1, s2, ...} x {i1, i2, ...}

produces a *knowledge product* -- literally, the product
of two different kinds of knowledge stored in the
repository. In this case, the knowledge product
KP = T x I

can define the implementation for a linked list in a
brand new language -- as long as the template and the
specific idioms exist in the repository.

It is here that the need for "highly granular" documents
becomes apparent. There are dozens of papers and books
that tell how you do things in a given language. But
until the idioms and simple abstractions contained in
those documents can be pinpointed -- i.e. individually
referenced -- there is no hope of automatically
generating a knowledge product like the example above.

To return again to the Content-Knowledge-Context (CKC)
picture, nuggets in the knowledge layer must (or should)
have fine-grained links to items in the content layer.

Related Knowledge is Important
------------------------------
Recall that the template for a linked list is only
one nugget of knowledge stored for that concept.
Related kinds of information is frequently important
to producing an answer.

For example, should I ask "How do I implement a linked
list in Java?", the Linked List topic *header* should
link directly to the Java idiom:
new java.util.LinkedList().
In this case, no template-instantiation is needed!

For a language like C, on the other hand, in which
many implementations exist with different APIs and
performance characteristics, multiple responses could
be returned, with situational ratings for each.

As another example, should you ask "How do I implement a
linked list in Cobol?", a human developer might very
well respond with the questions, "Are you sure you
want to do that?", "What is it you're trying to do?",
"Is Cobol the right language for the job?" "Do you
absolutely have to use Cobol and, if so, could you
consider using another structure that would be more
suitable for that language?"

These are questions that a knowledge repository will
probably not be smart enough to ask any time soon.
If it *were* able to do that, so much the better!

On the other hand, the knowledge repository *should*
make it possible to deduce the implementation in
Cobol, and put it forward in an online design
discussion. Other members of the discussion, should
then have access to reference material in the
repository to support arguments for why the performance
would suck, why another data structure should be
used, why a different language should be used, etc.

In this sense, the knowledge base is a partner to
and support for the discussion, which is carried on
the higher level, in the context of the discussion.

Community email addresses:
  Post message: unrev-II@onelist.com
  Subscribe: unrev-II-subscribe@onelist.com
  Unsubscribe: unrev-II-unsubscribe@onelist.com
  List owner: unrev-II-owner@onelist.com

Shortcut URL to this page:
http://www.onelist.com/community/unrev-II

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Next message: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"
Previous message: Eric Armstrong: "Re: [unrev-II] Peer to Peer and Email..."
Next in thread: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"
Reply: Henry van Eyken: "Re: [unrev-II] Knowledge Repositories and Design Discussions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Fri Mar 02 2001 - 21:14:48 PST