[unrev-II] OT: DOM, Grove, and Abstract Document Architecture, I: Where Have I Heard this Tune Before?

From: Dennis E. Hamilton (infonuovo@email.com)
Date: Tue Mar 13 2001 - 08:20:47 PST

  • Next message: Bernard Vatant: "[unrev-II] Announcement : Semantopic Map Project taking off"

    I was fascinated by the discussion of DOM and the struggle to relate to it
    as an abstracted interface, perhaps a place for intermediary documents, and
    also as a means of expressing documents through a standard interface for
    their access, navigation, presentation, and manipulation, possibly in a
    collaborative context. DOM is also appraised as an example of an interface
    badly done, object-oriented or not, and properly abstracted (or not). It
    did not escape me that the original request was simply for a definition of
    the term DOM to be added to a glossary.

    There was something that set off alarms for me, and I have been monitoring
    my mental radar to see what it was. I think it is the presumption of
    inter-representability. Here's how that shows up for me.

    1. I notice that there is some discussion here and in other circles about
    ontologies and adopting ontologies. I get nervous about that and the
    prospect for competing "standard" ontologies. [I have it as a thesis that a
    fixed number of ontologies is never "enough" and, more than that, the notion
    of fixing an ontology may be at best wishful thinking, like asserting that
    dictionaries standardize language.]

    2. My experience with a fixed conceptual model never being enough comes from
    my work, over the last dozen years or so, on document processing and
    document-management systems. That's where DOM comes into this discussion.

    3. There *is* a way that DOM is related to the notion of intermediary
    documents, based on my perusal of
    http://www.bootstrap.org/a2h/BI/2120.html#2C1. What the DOM specifications
    and the description of intermediary documents for OHS have in common is the
    tacit assumption that any practical document architecture can be reconciled
    with the DOM/intermediary model, thus providing a way of embracing legacies,
    the new, and so on. There is a translatability assumption. The
    intermediary document has the advantage of not being specified yet (as far
    as I know), so the assumption is safe for now. However, there is an
    explicit claim that this is the intention. For DOM, the same claim was also
    made, with the caveat that HTML and XML were the first cases that would be
    implemented. I have run into people who want to adopt DOM for a great
    variety of purposes solely on the desire to believe that the translatability
    claim is based on something real.

    4. A cool thing about intermediary documents (and about XML for that
    matter), is that one might present such an abstraction or transformation of
    a given concrete document from a variety of viewpoints and thus have this
    rich way of exploring different materials once they are somehow accurately
    brought to a common intermediate form. I am using achievement of an
    intermediate form as the same as mapping to an abstract but useful document
    architecture. That is close enough for now. See 7, below. By the way, I
    think the ability to construe document structures in a wide variety of ways
    is extremely valuable. The flag I want to put on this play is over any
    appeal to an underlying assumption about the ease of translating everything
    else to such a cool structure.

    5. I just realized that there is a tacit version of the UNCOL claim in our
    willingness to presume translatability among representations. That is, the
    notion that we can reduce our problems to a linear solution space by having
    an intermediate something that everything goes into and comes out of. And
    the process is expected to be lossless for everything that matters! I
    wouldn't be surprised were ontology efforts predicated on that same

    6. I am not prepared to concede that there isn't a way to accomplish the
    UNCOL claim (in any of its current guises). I even have ideas about how
    that might be accomplished, as I am quite sure, do many of you. And the
    practical experience is that these efforts have, so far, always failed.
    They may deliver something and be useful to a degree, but their most
    glorious ambition is left unattained. I suppose they also provide lessons,
    were we able to dig past the claims of success and discover why few (i.e.,
    none?) of these efforts are still around.
            I haven't been able to put my finger on what way we think we are smarter
    than our predecessors were when they thought they were smarter than theirs,
    or whether this is the usual case of doing the same thing over again
    expecting a different result, the common thread being denial that we are
    doing the same thing over again.
            It gives me pause.

    7. Back to DOM and other intermediate architectures as a case in point.
            7.1 Let's call a DOCUMENT STRUCTURE the formal encoding of the material of
    a document -- the raw bits and the organizational information that gives us
    what the presentation is, what and where there are dependencies on other
    things outside the structure (e.g., in other separated structures or to be
    obtained by computation). This is a concrete encoding so it also has a
    serial nature. (For interchange on media, etc.)
            7.2 Let's call a DOCUMENT ARCHITECTURE an abstraction of a class of
    documents that is usually friendly to a set of DOCUMENT STRUCTURES. It is
    fairly easy to see the SGML specification as providing a document
    architecture as well as a document structure. Ditto for XML. The ODA (Open
    Document Architecture) started with an architectural specification and then
    came up with document structures. More or less. None of these are purely
    orthogonal achievements, and the architectures and document structures
    although separable, appear to be joined at the hip.
            7.3 Now, whether we use an API model or an object model or we use another
    abstraction, there is a model. When we want to slide document structures
    from one document architecture or another, we have to face the problem of
    expressing objects conveyed in one model through the interface or objects or
    structures of another model. The question of model reconcilability come to
    the fore.
            7.4 One killer is that document structures are highly semantics free. Just
    like computer programs. What is needed in this situation is some form of
    reverse engineering that abstracts the document as it is conveyed in one
    structure and reconveys it in another structure (or model) so that the
    semantics and presentability of the original document are preserved. This
    is not easy and it is rarely very successful as an automated task. Part of
    the problem is establishing what is essential to be preserved as far as the
    authors and the users of the work are concerned.
            7.5 The other difficulty is that mapping to an abstraction that has a
    natural navigation markedly different than the serial structure of the
    document structure to be "mapped" can be very costly. Although the big
    limitation is the reconcilability of the document architectures, there are
    large pragmatic considerations related to the difficulty of translating a
    document structure into an architectural model and navigation model (e.g.,
    API) that is not a natural fit.

    8. Punting
            8.1 There are some typical ways that reality sets in. One is the
    equivalent of burying everything that is difficult into some kind of CTYPE
    element that is left for someone else to figure out. Or to create
    document-architecture-specific flavors of things that are passed through so
    that the user of the intermediate / interface now gets to deal with them (or
    not). Or punt the problem into an out-of-band assumption. In doing this,
    the UNCOL math breaks down. Making an intermediate that is the union of all
    the architectures to be contended with does not reduce an n-squared problem
    to a linear solution set in any way but through rapidly-shifting mirrors.
            8.2 Another approach is to actually fail to have a document architecture
    that is anything more than another way of presenting a rigid set of document
    structures, along with whatever punting is already wired into the structural
    system. (This has been asserted to be a quality -- that is, a defect -- of
    DOM.) This is the
    heck-with-it-lets-pick-a-structure-and-make-everyone-use-it approach. Since
    we are all acolytes to innovation, this leads to a community preoccupied
    with figuring out how to pick winners.
            8.3 And then there is doing the parts that do match up and work pretty
    easily, punting anything else. Sometimes called the proof-of-concept
    maneuver. (Don't detect when someone is employing a text graphic, as a
    potent example.) Check with the organizations that have made the effort to
    allow a community of people to use their favorite desktop applications on
    shared work-in-progress documents and see how happy they are with the
    translations that different word processors make to/from the formats of
    other word processors.

    9. The recent comment about Groves reminded me that I have never sent this
    note, waiting for some final inspiration to polish it off. I want to do
    that with some brash claims:

            a. It may be useful to demonstrate that one has a canonical data
    representation, though I am concerned that it is a red herring to require
    that. To have an usable proof that something is a CDR, there must be an
    agreed set of conditions that a CDR must satisfy, and that are verifiably by
    "proof." Having an existence proof of the satisfiability of those
    conditions would also be confidence-building. I don't know that anyone has
    done any of that formal work.

            b. One way to have a canonical data representation is to demonstrate that
    the structure of the alleged canonical data representation and its
    interpreter is computationally complete. (That is, the processors of these
    CDRs are equivalent to universal Turing machines.) If it can be represented
    by computer, it can be represented here. (As a practical matter, this
    consequence can be pretty useless, though.)

            c. Although it may be useful to have established that one has a CDR, it may
    not actually accomplish what is tacitly assumed that having a canonical data
    representation provides. In particular, it does not establish that one can
    automatically translate from any other CDR to the one of interest. It does
    establish that there is a translation, in some arcane sense, but it does not
    establish that it can be found and that it is a function that can be

            d. It is then useful to ponder the implications of expecting a canonical
    data representation to support something that perhaps can't be accomplished
    by computer in the first place.

    -- Dennis

    -----Original Message-----
    From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
    Sent: Monday, March 12, 2001 14:31
    To: unrev2
    Subject: [unrev-II] For Howard...Grove Proofs

    Howard Liu, are you there?
    This post is mostly intended for you, but it may
    be interesting to others, as well.

    A reading of the Groves documents shows that it
    was intended to be a canonical data-representation
    mechanism. However, they can't quite claim that,
    for lack of the necessary proofs.

    So, should you be interested, here are some
    interesting, useful (and needed) proofs:

    1. Prove that for any data representation,
       a usable Grove representation can be

    2. Prove that for any data representation,
       a usable sGrove (xGrove?) representation
       can be constructed.

       --where sGrove (xGrove?) is the *simplifed*
         version of groves, or the "xml-ified"
         version that Lee is constructing. It
         leaves out integer types and various
         other data types, to create a simpler,
         easier to understand, and more easily
         used standard.

    -----Original Message-----
    From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
    Sent: Tuesday, January 16, 2001 17:36
    To: unrev-II@egroups.com
    Subject: Re: [unrev-II] Seeking definition for "DOM"

    [ ... ]

    I guess the answer is that the objects in a DOM really are objects,
    in that they have behaviors and methods. It's just that they are
    so low-level -- intuitively, they're at the wrong level of

    To be fair, the model's weaknesses probably suffers from the
    inclusion of things like "processing instructions". For example,
    multiple processing instructions can occur under an element.

    Also (I never tire of pointing this out) the fact that text can
    occur virtually anywhere in a mixec-content element means that
    there is no "text" property, as one would intuitively expect
    for an "object" anchored at a given point in the hierarchy.

    The free intermixing of processing instructions, text, elements,
    and other low-level objects made true "object-ness" hard to
    capture. So the model is a tree of very low-level objects.

    [ ... ]

    -----Original Message-----
    From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
    Sent: Thursday, January 04, 2001 20:26
    To: unrev-II@egroups.com
    Subject: Re: [unrev-II] Seeking definition for "DOM"

    In simple terms, a DOM is "Document OBJECT model".
    It's a tree-structured hierarchy of objects that
    comprise a document.

    That's probably all you want for the glossary.
    On the other hand...

    [ ... ]

    So my qualms are not with
    the structure, but with calling it a Document
    OBJECT Model. "Document Structure" would have
    been a more fitting and appropriate name, in
    my book.

    -----Original Message-----
    From: G. Ken Holman [mailto:gkholman@cranesoftwrights.com]
    Sent: Thursday, January 04, 2001 11:28
    To: unrev-II@egroups.com; unrev-II@egroups.com
    Subject: Re: [unrev-II] Seeking definition for "DOM"

    At 01/01/04 10:44 -0800, Eugene Eric Kim wrote:
    >On Wed, 3 Jan 2001, N. C a r r o l l wrote:
    >[DOM description deleted]
    > > Is that similar to what Doug is describing as an "intermediary
    > > document"?
    > > http://www.bootstrap.org/a2h/BI/2120.html#2C1
    >Mmm, weighty question. Short answer is yes. It's similar in that a
    >DOM-like interface could indeed be used to manipulate this "intermediary

    Perhaps ... but the implementation of this intermediary is entirely opaque
    to the user; the *only* access to the intermediary is *entirely* through
    the abstract interface.

    The abstract interface is reified as an API for each particular programming
    language (direct access) or communication technology (remote access).

    [ ... ]

    By no means is the opaque internal implementation a document interchange
    format. When the opaque implementation of a document is "exported" into a
    transparent format, then the agreed-upon document interchange format would
    be used, but we should not give any indication that an implementation is
    required to support the document interchange format internally ... it must
    have the flexibility to accept the document interchange format with any
    possible internal implementation scheme it wishes.

    I think the distinction is critically important.

    I hope this helps.

    ...................... Ken

    -----Original Message-----
    From: N. C a r r o l l [mailto:ncarroll@inreach.com]
    Sent: Wednesday, January 03, 2001 19:31
    To: unrev-II@egroups.com
    Subject: Re: [unrev-II] Seeking definition for "DOM"



    > >The definition of a DOM escapes me.
    > According to http://www.w3.org/DOM/:
    > The Document Object Model is a platform- and language-neutral interface
    > that will allow programs and scripts to dynamically access and update the
    > content, structure and style of documents. The document can be further
    > processed and the results of that processing can be incorporated back into
    > the presented page.

    Thanks, I'll use that for the moment. (Journalists will be reading

    > In my own words, I say something along the lines of:
    > Document Object Model
    > An interface abstraction defining processes and information, both
    > to a user and thus required to be supported and exposed by a system, that
    > can be used to act on the components of a document abstraction. Actions
    > include building document information and reading document
    > information. The interface abstraction is reified for a given programming
    > language as an API. The document abstraction is reified by the supplier
    > the API and is typically hidden from the user. The user is obliged to
    > manipulate and access the document entirely through the API.

    Is that similar to what Doug is describing as an "intermediary

    > I hope this helps.




    Nicholas Carroll Email: ncarroll@inreach.com Alternate: ncarroll@iname.com ______________________________________________________

    Community email addresses: Post message: unrev-II@onelist.com Subscribe: unrev-II-subscribe@onelist.com Unsubscribe: unrev-II-unsubscribe@onelist.com List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page: http://www.onelist.com/community/unrev-II

    ------------------------ Yahoo! Groups Sponsor ---------------------~-~> Make good on the promise you made at graduation to keep in touch. Classmates.com has over 14 million registered high school alumni--chances are you'll find your friends! http://us.click.yahoo.com/l3joGB/DMUCAA/4ihDAA/IaAVlB/TM ---------------------------------------------------------------------_->

    Community email addresses: Post message: unrev-II@onelist.com Subscribe: unrev-II-subscribe@onelist.com Unsubscribe: unrev-II-unsubscribe@onelist.com List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page: http://www.onelist.com/community/unrev-II

    Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

    This archive was generated by hypermail 2b29 : Tue Mar 13 2001 - 08:30:27 PST