A) I would would like to tender an opinion from the following vantage
points:
1) Since 1976 I've designed and or written about 5 DBMSs
a) three of which were full-on network DBMS's
b) two of which were hierarchical DBMS's
2) I am stronger in parallel, visual, depth-first, and syntax
3) I am weaker in linear, auditory, breadth-first, and semantics
4) I see knowledge management as a compression problem
a) how to move entropy through the narrowest channel
b) Huffman codes is a good starting place
B) Some principals I would like to toss out (using myself as a use case)
1) we should support people who are strong in my strong areas (see
above)
a) which implies unlabeled nodes and arcs
2) we should support people who are strong in my weak areas (see above)
b) which implies labeled nodes and arcs
3) assume that we have unlimited resources to implement
a) so, assume that we are writing from scratch
b) assume that the system self improves
1) so a kludge of open code can evolve towards code written from
scratch
2) is this possible in the real world?
3) alternative is to choose real world compromises
a) starting from ideal design
5) design and deploy the simplest simplest case first
a) add more complex case as second tier
b) allow a gentle learning curve
1) for future practitioners
2) for old practioners who have low semantic persistence
6) human augmentation systems should map easily between human and system
a) take into account human limitations in assimilating information
1) such as GOMs and 7+-2 short term memory constraints
b) software should map easily to wetware
1) information structures should map easily to brain structures
2) facilitate dialog between designers and researchers
a) such as software designers and brain research
7) try to keep a list of priciples to 7 or less.
So, my proposal is to keep nodes and arcs pure. This yields about 5-10K of
source code that deals with issues like: how to attach nodes together, how
to represent an arc (as a node with a single predecessor and single
successor -- for those who enjoy paradox). This represents about 3-6 months
of work to implement, or a couple weeks of learning curve.
From the perspective of compression, language can be seen as a somewhat
static network of probabilities. At the lowest level (from an ASCII
standpoint) you have the probabilities of letters -- which is handled quite
nicely by Huffman codes. By adding a priori probabilities, you can even
improve Huffman codes by about 30% -- you know: given the letter Q, the most
likely successor letter is U ... Words could be treated as the next level
up: as an aggregation of letter nodes that determine the the probabilities
of the next aggregation of letter nodes. Ideas are the next step up from
that. So, the phrase "the quick brown ___" should yield "fox", for most
people.
Why do I bring this up? Well, compression can be though as an operation upon
a set of nodes with probabilities between the arcs. Thus, language can be
thought of as a fairly static set of nodes with a commonly accepted range of
probabilities between the arcs. After hearing Jeff Rulifson talk about the
compiler-compiler in his part of the '68 Demo video, it would make sense to
support a future DKR of DKRs. In this case, a "Categorized" node would
represent a relationship between two network topologies: 1) a fairly static
language topology and 2) a fairly volatile dialogue topology. Does this
imply that we need to create a special arc type between topologies? The
answer is "no no no!" (heheh) All we need is to perform a query on arc-nodes
that have arcs to the "language topology" node. This being a very common
query, one would hope that the code for such would compress into a very
small instruction set -- a very nice use for the Transmeta code-morphing
paradigm -- but I digress once too often.
Popping these digressions back to the task at hand. I would suggest that we
treat nodes, arcs, and categories separate. For now,I would suggest a simple
keyword index to everything in the DKR. An email node with category "Foo"
could be structured along the lines of a
tree with with following nodes
"Category", "Email-ID", "Transaction"
"Foo", "ID103040", <link Foo Category Email-ID>
...
Yawn, well it's late and I'm making up syntax in my sleep -- my apologies.
At least I can spare you all from any more rant, alluded to by the above
principles, that were merely iterated without elucidation.
(Hey! Another thought: perhaps we could entice Joe to expand his glossary a
bit to -- say -- the English language and all its metrics? heheh)
Warren Stringer
"no node knows another like another knows node, no?"
-- nobody in particular
-----Original Message-----
From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
Sent: Friday, June 23, 2000 7:00 PM
To: unrev-II@egroups.com
Subject: Re: [unrev-II] Augment + categories = OHS v0.1
Jack Park wrote:
> ... Gil uses the term "rigidify." That works for me, but there
> are other points of view as well. At issue is the fact that we
> all categorize the world in our own way. Production-line education
> tends to enforce standardization in that arena, but we are still
> individuals with our own non-linearities and so forth.
>
Ah... Now I understand the point that Gil was trying to make.
Yes, this is a system usage issue. The larger the system gets,
the more rigid the categories become -- to the degree that they
become standards. To the degree they don't, similar and redundant
categories are continually added to the system.
On the other hand, categories with various "shades of meaning"
might even be useful. If someone develops a formulation for
defining near-equivalences, of the form:
"hyper" = 90% match with "intense"
= 80% match with "over the top"
= xx% match with conceptX
....
Then some interesting fuzzy search capabilities begin to be
possible. I don't intend to work on that layer of the system,
but it is interesting that the foundation we are building may
just enable it.
--As you point out, there is still the proble of mapping from
*my* concepts to some "shared" conceptual framework out there.
> The fundamental architecture being espoused within the meeting
> was that of an engine that mutates original documents by adding
> links to them. The fundamental approach taken in the architecture
> I present here is one in which absolutely no modifications are
> ever performed on original documents. All linkages are formed
> "above" the permanent record of human discourse and experience.
> I strongly believe that the extra effort required to avoid
> building a system that simply plays with original documents will
> prove to be of enormous value in the larger picture.
>
This idea deserves careful consideration. I have a suspicion you
may be right about that. Our talks about how to use Wiki effectively
have really centered on how we control modifications to underlying
documents. I haven't come at things from the perspective you
suggest. It's time to take a detailed look at that approach, I think.
Also: I'm delighted that we're not going for a full ontology in
version 1. Yay! But I am equally delighted that system we seem to
be zeroing in on may help provide a basis for that work. Life should
be interesting
This archive was generated by hypermail 2b29 : Sat Jun 24 2000 - 00:12:10 PDT