From: Eric Armstrong <eric.armstrong@eng.sun.com>
altintdev@webtv.net wrote:
> Also, the modern web browser, using the current HTML DOM, gives
> capabilties required for advanced hyperdocument processing.
Alas, all is not as copasetic as one would have hoped. I had the exact
same
hope before a gang of ugly facts intruded.
Adam Cheyer and Jack Park have been saying that XML will be necessary
for the foundation of the DKR -- that HTML simply won't work. I was
unwilling to accept that, until Jon Bosak pointed to concrete cases that
mess
things up big time.
The problem with HTML is two-fold:
1) There is no restriction against using "structure" elements as
"display"
format controls. So one could write <p>...<h3>...</h3>....,
where the real intent is to change the font characteristics
within the
paragraph, rather than start a new section.
2) There is no requirement for supplying end-tags in order to make a
document well-formed.
The second issue makes it impossibly hard to determine algortithmically
what the original intention was. Since </p> is not required in HTML, it
is in many cases impossible to determine what the author intended. With
</p> supplied, it is relatively clear that <p>...<h3>...</h3>...</p>
uses
<h3> for its text style while <p>...</p><h3>...</h3>... uses <h3> for
its
structuring.
The problems caused by lack of end-tags are pretty nearly
insurmountable,
I think, because multiple un-ended tags can exist, which raises the
complexity
of figuring out what the intended structure is.
Even line breaks in the file may be of little use. The <h3> could easily
start
at the beginning of a line.
To use HTML directly, or even to convert it to XHTML, therefore,
requires
a highly-intelligent, A/I-like processing engine that figures out what
the
intended structure is likely to be and reformats the code to achieve the
necessary distinction between structure tags and content tags. Without
that
distinction, the "structure" displayed in the browser might be far
removed
from the original intent.
There are other cases where structural controls are used for formatting.
One way to get indenting, for example is to use <ul><ul>, rather than
<blockquote>. Rather than implying structure, the <ul>'s in this example
are format control.
The question then resolves to an engineering issue: Given that one
*could*
spend a great deal of time writing a complex processing engine that made
reasonable guesses about the intended structure of a document, and given
that it would take quite a bit of time to do so, does it make more sense
to
focus on an XML-based solution, given the growing ubiquity of that
medium?
I suspect that the answer is yes. Which makes me start thinking: What
would an Email-version of DocBook look like? (I'm thinking that
Xmail would be a good name for such a thing -- but I can't believe that
no one else is working on that problem!)
------------------------------------------------------------------------
GET A NEXTCARD VISA, in 30 seconds! Get rates
as low as 0.0% Intro APR and no hidden fees.
Apply NOW!
http://click.egroups.com/1/975/2/_/444287/_/952982933/
------------------------------------------------------------------------
Community email addresses:
Post message: unrev-II@onelist.com
Subscribe: unrev-II-subscribe@onelist.com
Unsubscribe: unrev-II-unsubscribe@onelist.com
List owner: unrev-II-owner@onelist.com
Shortcut URL to this page:
http://www.onelist.com/community/unrev-II
This archive was generated by hypermail 2b29 : Mon Mar 13 2000 - 13:36:08 PST