Re: [unrev-II] XML limits

From: Eric Armstrong (
Date: Wed Apr 05 2000 - 15:22:52 PDT

  • Next message: Jack Park: "Re: [unrev-II] XML limits"

    First, some general notes:
     * At the moment, I'm focusing on in-memory data structures
       and the required manipulation mechanisms. I'm not focusing
       on representation of the information until after there is a
       sense of "completeness" in the internal data model. That's
       the point at which it will become clear whether or not XML
       will work as an external representation.

     * The major problem with XML that I see at the moment is
       data export. Given that I have a massively-interconnected
       graph of information nodes, any one "slice" of those nodes
       may constitute a document. That document can be represented
       in XML. Differences between documents and messages that
       transmit the differences can also be represented in XML.
       But if you wanted to export the repository so you could
       import it into another system, would XML be very useful
       for that?

     * Thanks for the new and repeated references. They're on my
       priority reading list. (Which is to say I'll probably be
       able to get to them by next month...)

    Paul Fernhout wrote:
    > My longer point is that the knowledge management / representation
    > problem is a deep one, and XML doesn't address it in a serious way,
    > and confuses the subject by the hype making it sound like XML does
    > address the topic of knowledge representation in a serious way.
    Hmmm. I never had that impression. I got that if I have data, I can
    represent it in XML -- especially if the data is structured. What I
    keep wrestling with is that any individual *view* of the data benefits
    from hierarchy -- it helps to organize the info and orient the reader.
    But the underlying data is a multi-connected graph, not a hierarchy.
    So maybe what's really needed is:

                             +- GUI operations
      ?repository? --> XML-based view +--> Html Representation
                                      +--> PDF representation

    Identifying the structure of the repository is my major quest at
    the moment.

    > Squeak, Python, Common Lisp (less so) are interesting choices.
    > I'm starting to think Squeak might be the best choice for prototyping
    > (for me) given that it is completely cross-platform and open. It's
    > cross-platform GUI does the best job of addressing the DKR design
    > requirement of shareable screens.
    Can you tell me more about Squeak (again), and why I'm going to like it,
    and where to find it?

    > a talk last year by Marvin Minsky he went on at
    > length about the need for multiple representational strategies for
    > problem solving. He argued the human mind may perceive problems using
    > five or six strategies (ex. geometrical reasoning, formal logic,
    > heuristic rules of thumb, pattern recognition, semantic networks,
    > others) and continuously picks the best one at the time to progress in
    > thinking.
    This seems fundamental. Has he written this up anywhere, to your

    > Maybe what we need is a overview of the AI and knowledge management
    > fields and how each area or major problem/topic would affect a
    > DKR/OHS.
    That strikes me as profitable enumeration of issues.
    Any thoughts on how we should get started?
    > Also, what will evolve over time for an OHS/DKR project is a set of
    > useful code that can manipulate data strucutres that are related to
    > knowledge representaion. We might also wish to have a survey of such
    > existing code.
    Yeah. I started the reference list with things like that in mind.
    I've fallen behind in keeping the list up to date, much less producing
    even preliminary evaluations of different papers. I've seen a lot of
    stuff that doesn't excite me. IBIS was a notable exception. This is an
    where we desperately need even a preliminary DKR, so we track
    evaluations of different papers, and start sorting them by relevance and
    other criteria (like readability and explanatory power).

    > time goes on, any restrictions will become obsolete.
    > One needs a representational system that can adapt to user needs.
    Can you give an example of that? Something simple will do. Maybe my
    sixth grade view of physics vs. my college-level view, for example.
    Does that make sense? (A specific adaptation would be even better.)

    > while XML, could be a part of that solution, the important issues go
    > beyond that -- to standards creation and revison and communication,
    > and to coin a phrase "data upgrading".
    I understand about standards creation. That's where the interesting work
    is going on even as we speak. I don't see how revision and communication
    go beyond XML. And I'm not sure what you mean by data upgrading. Can you

    > The deeper issue is that rather than focus on ways to limit
    > representations (DTDs) we need to focus on ways to transform, extend,
    > and simplify representations as needed (sort of along the multi-level
    > approach I mentioned earlier).
    As I mentioned, DTDs only give you minimal validation. Like Lisp or
    SmallTalk apps, the "interesting" validation will probably occur within
    the context of the app -- as long as you are doing "interesting"

    However, I think the better strategy is to punt on that issue. I'm not
    interested in AI-level reasoning about statements like "Horses fly". I
    am totally uninterested in any sort automatic verification for such
    I am interested in one person having the ability to assert "Horses fly",
    another person to argue against it, and for individuals to estimate the
    value and usefulness of a document based on the assertions it contains.

    Here are two analogies:
      1) "Decorative" tags vs. "Structure" tags.
         In DocBook, these are called "inline" tags (like bold and italic)
         vs. "block" tags (like sect1 and sect2). One thing that XML does
         *not* give me is a good way to make a clean separation between
         those two. That distinction is important, too, for two major
           a) When displaying a document, I want to know which elements
              belong in the outline (table of contents, tree view) and
              which elements belong only in the content-display.

           b) For structure elements, the sub-structure should always
              consist of (1) content -- any combination of text and
              decorative elements -- *followed* by structure elements.
              In other words, any structure element can have one piece
              of content, followed by substructure elements, and there
              is never any overlap between them. XML gives me no such
              mechanism. (The DocBook solution is to define a <title>
              element for each <sectN>. That introduces two tags where
              only one is really needed, and complicates the processing.

         The point of this analogy is that I frequently want to separate
         structure from content, so I can treat them separately.

      2. The second analogy is in the graphic representation of computer
         programs. In graphics, hierarchy is expressed by "diving in".
         You look inside a graphic object to see what it contains. Here
         again, I need a distinction between control elements and normal
         statements. The reason: graphical representation of a = b;
         does me no good whatsoever. It consumes space for the graphics
         that has no value whatsoever for understanding the program.
         Graphical representations of programs, therefore, need to stop
         at the control-flow statements. A graphical representation of
         all the if, for, and case statements in a program may be of use.
         In any one block, though, a simple listing of the normal
         statements is sufficient.

    I see the same issue with respect to knowledge representation.
    Attempting to solve the whole problem by representing "tree", "apple",
    "green", "red", etc. is just too hard. Let the human interpret the
    meaning of the words. But there is an underlying structure that it makes
    sense to automated. Perhaps it is Noam Chomsky's deep structure, or
    perhaps a logic model, or perhaps one of several representations as
    identified by Minsky.
    If we can construct a system within which we can model those
    relationships and reason about them, we can make a ton of progress
    without having to make a computer into a "thinking" machine.

    > ...Any DKR/OHS will need to be more
    > than a bunch of passive data in a database. It will need many programs
    > to do things to that data to make new data (search, format, summarize,
    > repackage, interpret, transfer, upgrade, etc.).
    > A more important issue than data transmission format (the one XML
    > tries to address) is to build a robust platform for doing those
    > algorithmic things.
    Oddly enough. I haven't seen that the knowledge repository needs a lot
    of functionality. I've been looking for it, but most of the operations
    you mention I see as either aspects of the UI (like searching) or
    operations best conducted by the user (summarizing).

    > ...As a deeper approach, one tries to represent the knowledge and
    > algorithms in an abstract enough way as to be ideally programming
    > language neutralor at least programming language retargetable
    > (generating whatever code in whatever language as needed).
    This would of course be ideal, assuming that the manipulations need to
    be part of the repository system. I am as yet unconvinced that they have
    to be, but I am open to argument on the subject.

    (thanks for another great, thought-provoking note.)

    Get a NextCard Visa, in 30 seconds! Get rates
    as low as 2.9% Intro or 9.9% Fixed APR and no hidden fees.
    Apply NOW!

    Community email addresses:
      Post message:
      List owner:

    Shortcut URL to this page:

    This archive was generated by hypermail 2b29 : Wed Apr 05 2000 - 15:30:18 PDT