[unrev-II] OT: DOM, Grove, and Abstract Document Architecture, I: Where Have I Heard this Tune Before?

From: Dennis E. Hamilton (infonuovo@email.com)
Date: Tue Mar 13 2001 - 08:20:47 PST

Next message: Bernard Vatant: "[unrev-II] Announcement : Semantopic Map Project taking off"

Previous message: Simon Buckingham Shum: "[unrev-II] PhD Studentships, Oct.2001 (KMi, Open Univ, UK)"
In reply to: N. C a r r o l l: "Re: [unrev-II] Seeking definition for "DOM""
Next in thread: Eric Armstrong: "Re: [unrev-II] Seeking definition for "DOM""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I was fascinated by the discussion of DOM and the struggle to relate to it
as an abstracted interface, perhaps a place for intermediary documents, and
also as a means of expressing documents through a standard interface for
their access, navigation, presentation, and manipulation, possibly in a
collaborative context. DOM is also appraised as an example of an interface
badly done, object-oriented or not, and properly abstracted (or not). It
did not escape me that the original request was simply for a definition of
the term DOM to be added to a glossary.

There was something that set off alarms for me, and I have been monitoring
my mental radar to see what it was. I think it is the presumption of
inter-representability. Here's how that shows up for me.

1. I notice that there is some discussion here and in other circles about
ontologies and adopting ontologies. I get nervous about that and the
prospect for competing "standard" ontologies. [I have it as a thesis that a
fixed number of ontologies is never "enough" and, more than that, the notion
of fixing an ontology may be at best wishful thinking, like asserting that
dictionaries standardize language.]

2. My experience with a fixed conceptual model never being enough comes from
my work, over the last dozen years or so, on document processing and
document-management systems. That's where DOM comes into this discussion.

3. There *is* a way that DOM is related to the notion of intermediary
documents, based on my perusal of
http://www.bootstrap.org/a2h/BI/2120.html#2C1. What the DOM specifications
and the description of intermediary documents for OHS have in common is the
tacit assumption that any practical document architecture can be reconciled
with the DOM/intermediary model, thus providing a way of embracing legacies,
the new, and so on. There is a translatability assumption. The
intermediary document has the advantage of not being specified yet (as far
as I know), so the assumption is safe for now. However, there is an
explicit claim that this is the intention. For DOM, the same claim was also
made, with the caveat that HTML and XML were the first cases that would be
implemented. I have run into people who want to adopt DOM for a great
variety of purposes solely on the desire to believe that the translatability
claim is based on something real.

4. A cool thing about intermediary documents (and about XML for that
matter), is that one might present such an abstraction or transformation of
a given concrete document from a variety of viewpoints and thus have this
rich way of exploring different materials once they are somehow accurately
brought to a common intermediate form. I am using achievement of an
intermediate form as the same as mapping to an abstract but useful document
architecture. That is close enough for now. See 7, below. By the way, I
think the ability to construe document structures in a wide variety of ways
is extremely valuable. The flag I want to put on this play is over any
appeal to an underlying assumption about the ease of translating everything
else to such a cool structure.

5. I just realized that there is a tacit version of the UNCOL claim in our
willingness to presume translatability among representations. That is, the
notion that we can reduce our problems to a linear solution space by having
an intermediate something that everything goes into and comes out of. And
the process is expected to be lossless for everything that matters! I
wouldn't be surprised were ontology efforts predicated on that same
assumption.

6. I am not prepared to concede that there isn't a way to accomplish the
UNCOL claim (in any of its current guises). I even have ideas about how
that might be accomplished, as I am quite sure, do many of you. And the
practical experience is that these efforts have, so far, always failed.
They may deliver something and be useful to a degree, but their most
glorious ambition is left unattained. I suppose they also provide lessons,
were we able to dig past the claims of success and discover why few (i.e.,
none?) of these efforts are still around.
I haven't been able to put my finger on what way we think we are smarter
than our predecessors were when they thought they were smarter than theirs,
or whether this is the usual case of doing the same thing over again
expecting a different result, the common thread being denial that we are
doing the same thing over again.
It gives me pause.

7. Back to DOM and other intermediate architectures as a case in point.
        7.1 Let's call a DOCUMENT STRUCTURE the formal encoding of the material of
a document -- the raw bits and the organizational information that gives us
what the presentation is, what and where there are dependencies on other
things outside the structure (e.g., in other separated structures or to be
obtained by computation). This is a concrete encoding so it also has a
serial nature. (For interchange on media, etc.)
        7.2 Let's call a DOCUMENT ARCHITECTURE an abstraction of a class of
documents that is usually friendly to a set of DOCUMENT STRUCTURES. It is
fairly easy to see the SGML specification as providing a document
architecture as well as a document structure. Ditto for XML. The ODA (Open
Document Architecture) started with an architectural specification and then
came up with document structures. More or less. None of these are purely
orthogonal achievements, and the architectures and document structures
although separable, appear to be joined at the hip.
        7.3 Now, whether we use an API model or an object model or we use another
abstraction, there is a model. When we want to slide document structures
from one document architecture or another, we have to face the problem of
expressing objects conveyed in one model through the interface or objects or
structures of another model. The question of model reconcilability come to
the fore.
        7.4 One killer is that document structures are highly semantics free. Just
like computer programs. What is needed in this situation is some form of
reverse engineering that abstracts the document as it is conveyed in one
structure and reconveys it in another structure (or model) so that the
semantics and presentability of the original document are preserved. This
is not easy and it is rarely very successful as an automated task. Part of
the problem is establishing what is essential to be preserved as far as the
authors and the users of the work are concerned.
        7.5 The other difficulty is that mapping to an abstraction that has a
natural navigation markedly different than the serial structure of the
document structure to be "mapped" can be very costly. Although the big
limitation is the reconcilability of the document architectures, there are
large pragmatic considerations related to the difficulty of translating a
document structure into an architectural model and navigation model (e.g.,
API) that is not a natural fit.

8. Punting
        8.1 There are some typical ways that reality sets in. One is the
equivalent of burying everything that is difficult into some kind of CTYPE
element that is left for someone else to figure out. Or to create
document-architecture-specific flavors of things that are passed through so
that the user of the intermediate / interface now gets to deal with them (or
not). Or punt the problem into an out-of-band assumption. In doing this,
the UNCOL math breaks down. Making an intermediate that is the union of all
the architectures to be contended with does not reduce an n-squared problem
to a linear solution set in any way but through rapidly-shifting mirrors.
        8.2 Another approach is to actually fail to have a document architecture
that is anything more than another way of presenting a rigid set of document
structures, along with whatever punting is already wired into the structural
system. (This has been asserted to be a quality -- that is, a defect -- of
DOM.) This is the
heck-with-it-lets-pick-a-structure-and-make-everyone-use-it approach. Since
we are all acolytes to innovation, this leads to a community preoccupied
with figuring out how to pick winners.
        8.3 And then there is doing the parts that do match up and work pretty
easily, punting anything else. Sometimes called the proof-of-concept
maneuver. (Don't detect when someone is employing a text graphic, as a
potent example.) Check with the organizations that have made the effort to
allow a community of people to use their favorite desktop applications on
shared work-in-progress documents and see how happy they are with the
translations that different word processors make to/from the formats of
other word processors.

9. The recent comment about Groves reminded me that I have never sent this
note, waiting for some final inspiration to polish it off. I want to do
that with some brash claims:

a. It may be useful to demonstrate that one has a canonical data
representation, though I am concerned that it is a red herring to require
that. To have an usable proof that something is a CDR, there must be an
agreed set of conditions that a CDR must satisfy, and that are verifiably by
"proof." Having an existence proof of the satisfiability of those
conditions would also be confidence-building. I don't know that anyone has
done any of that formal work.

b. One way to have a canonical data representation is to demonstrate that
the structure of the alleged canonical data representation and its
interpreter is computationally complete. (That is, the processors of these
CDRs are equivalent to universal Turing machines.) If it can be represented
by computer, it can be represented here. (As a practical matter, this
consequence can be pretty useless, though.)

c. Although it may be useful to have established that one has a CDR, it may
not actually accomplish what is tacitly assumed that having a canonical data
representation provides. In particular, it does not establish that one can
automatically translate from any other CDR to the one of interest. It does
establish that there is a translation, in some arcane sense, but it does not
establish that it can be found and that it is a function that can be
computed.

d. It is then useful to ponder the implications of expecting a canonical
data representation to support something that perhaps can't be accomplished
by computer in the first place.

-- Dennis

-----Original Message-----
From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
Sent: Monday, March 12, 2001 14:31
To: unrev2
Subject: [unrev-II] For Howard...Grove Proofs

Howard Liu, are you there?
This post is mostly intended for you, but it may
be interesting to others, as well.

A reading of the Groves documents shows that it
was intended to be a canonical data-representation
mechanism. However, they can't quite claim that,
for lack of the necessary proofs.

So, should you be interested, here are some
interesting, useful (and needed) proofs:

1. Prove that for any data representation,
a usable Grove representation can be
constructed.

2. Prove that for any data representation,
a usable sGrove (xGrove?) representation
can be constructed.

   --where sGrove (xGrove?) is the *simplifed*
     version of groves, or the "xml-ified"
     version that Lee is constructing. It
     leaves out integer types and various
     other data types, to create a simpler,
     easier to understand, and more easily
     used standard.

-----Original Message-----
From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
Sent: Tuesday, January 16, 2001 17:36
To: unrev-II@egroups.com
Subject: Re: [unrev-II] Seeking definition for "DOM"

[ ... ]

I guess the answer is that the objects in a DOM really are objects,
in that they have behaviors and methods. It's just that they are
so low-level -- intuitively, they're at the wrong level of
abstraction.

To be fair, the model's weaknesses probably suffers from the
inclusion of things like "processing instructions". For example,
multiple processing instructions can occur under an element.

Also (I never tire of pointing this out) the fact that text can
occur virtually anywhere in a mixec-content element means that
there is no "text" property, as one would intuitively expect
for an "object" anchored at a given point in the hierarchy.

The free intermixing of processing instructions, text, elements,
and other low-level objects made true "object-ness" hard to
capture. So the model is a tree of very low-level objects.

[ ... ]

-----Original Message-----
From: Eric Armstrong [mailto:eric.armstrong@eng.sun.com]
Sent: Thursday, January 04, 2001 20:26
To: unrev-II@egroups.com
Subject: Re: [unrev-II] Seeking definition for "DOM"

In simple terms, a DOM is "Document OBJECT model".
It's a tree-structured hierarchy of objects that
comprise a document.

That's probably all you want for the glossary.
On the other hand...

[ ... ]

So my qualms are not with
the structure, but with calling it a Document
OBJECT Model. "Document Structure" would have
been a more fitting and appropriate name, in
my book.

-----Original Message-----
From: G. Ken Holman [mailto:gkholman@cranesoftwrights.com]
Sent: Thursday, January 04, 2001 11:28
To: unrev-II@egroups.com; unrev-II@egroups.com
Subject: Re: [unrev-II] Seeking definition for "DOM"

At 01/01/04 10:44 -0800, Eugene Eric Kim wrote:
>On Wed, 3 Jan 2001, N. C a r r o l l wrote:
>
>[DOM description deleted]
>
> > Is that similar to what Doug is describing as an "intermediary
> > document"?
> > http://www.bootstrap.org/a2h/BI/2120.html#2C1
>
>Mmm, weighty question. Short answer is yes. It's similar in that a
>DOM-like interface could indeed be used to manipulate this "intermediary
>document."

Perhaps ... but the implementation of this intermediary is entirely opaque
to the user; the *only* access to the intermediary is *entirely* through
the abstract interface.

The abstract interface is reified as an API for each particular programming
language (direct access) or communication technology (remote access).

[ ... ]

By no means is the opaque internal implementation a document interchange
format. When the opaque implementation of a document is "exported" into a
transparent format, then the agreed-upon document interchange format would
be used, but we should not give any indication that an implementation is
required to support the document interchange format internally ... it must
have the flexibility to accept the document interchange format with any
possible internal implementation scheme it wishes.

I think the distinction is critically important.

I hope this helps.

...................... Ken

-----Original Message-----
From: N. C a r r o l l [mailto:ncarroll@inreach.com]
Sent: Wednesday, January 03, 2001 19:31
To: unrev-II@egroups.com
Subject: Re: [unrev-II] Seeking definition for "DOM"

Ken:

[snip]

> >The definition of a DOM escapes me.
>
> According to http://www.w3.org/DOM/:
>
> The Document Object Model is a platform- and language-neutral interface
> that will allow programs and scripts to dynamically access and update the
> content, structure and style of documents. The document can be further
> processed and the results of that processing can be incorporated back into
> the presented page.

Thanks, I'll use that for the moment. (Journalists will be reading
these...)

[snip]
>
> In my own words, I say something along the lines of:
>
> Document Object Model
>
> An interface abstraction defining processes and information, both
available
> to a user and thus required to be supported and exposed by a system, that
> can be used to act on the components of a document abstraction. Actions
> include building document information and reading document
> information. The interface abstraction is reified for a given programming
> language as an API. The document abstraction is reified by the supplier
of
> the API and is typically hidden from the user. The user is obliged to
> manipulate and access the document entirely through the API.

Is that similar to what Doug is describing as an "intermediary
document"?
http://www.bootstrap.org/a2h/BI/2120.html#2C1

> I hope this helps.

Yep!

Nicholas

-- ______________________________________________________

Nicholas Carroll Email: ncarroll@inreach.com Alternate: ncarroll@iname.com ______________________________________________________

Community email addresses: Post message: unrev-II@onelist.com Subscribe: unrev-II-subscribe@onelist.com Unsubscribe: unrev-II-unsubscribe@onelist.com List owner: unrev-II-owner@onelist.com

Shortcut URL to this page: http://www.onelist.com/community/unrev-II

------------------------ Yahoo! Groups Sponsor ---------------------~-~> Make good on the promise you made at graduation to keep in touch. Classmates.com has over 14 million registered high school alumni--chances are you'll find your friends! http://us.click.yahoo.com/l3joGB/DMUCAA/4ihDAA/IaAVlB/TM ---------------------------------------------------------------------_->

Community email addresses: Post message: unrev-II@onelist.com Subscribe: unrev-II-subscribe@onelist.com Unsubscribe: unrev-II-unsubscribe@onelist.com List owner: unrev-II-owner@onelist.com

Shortcut URL to this page: http://www.onelist.com/community/unrev-II

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Next message: Bernard Vatant: "[unrev-II] Announcement : Semantopic Map Project taking off"
Previous message: Simon Buckingham Shum: "[unrev-II] PhD Studentships, Oct.2001 (KMi, Open Univ, UK)"
In reply to: N. C a r r o l l: "Re: [unrev-II] Seeking definition for "DOM""
Next in thread: Eric Armstrong: "Re: [unrev-II] Seeking definition for "DOM""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Tue Mar 13 2001 - 08:30:27 PST