[unrev-II] Re: HtmlDOM -- XML -- Xmail

From: Eric Armstrong (eric.armstrong@eng.sun.com)
Date: Mon Mar 13 2000 - 13:29:03 PST

  • Next message: Eric Armstrong: "[unrev-II] HyperNews -- evaluation"

    From: Eric Armstrong <eric.armstrong@eng.sun.com>

    altintdev@webtv.net wrote:

    > Also, the modern web browser, using the current HTML DOM, gives
    > capabilties required for advanced hyperdocument processing.

    Alas, all is not as copasetic as one would have hoped. I had the exact
    same
    hope before a gang of ugly facts intruded.

    Adam Cheyer and Jack Park have been saying that XML will be necessary
    for the foundation of the DKR -- that HTML simply won't work. I was
    unwilling to accept that, until Jon Bosak pointed to concrete cases that
    mess
    things up big time.

    The problem with HTML is two-fold:
       1) There is no restriction against using "structure" elements as
    "display"
            format controls. So one could write <p>...<h3>...</h3>....,
            where the real intent is to change the font characteristics
    within the
            paragraph, rather than start a new section.

       2) There is no requirement for supplying end-tags in order to make a
           document well-formed.

    The second issue makes it impossibly hard to determine algortithmically
    what the original intention was. Since </p> is not required in HTML, it
    is in many cases impossible to determine what the author intended. With
    </p> supplied, it is relatively clear that <p>...<h3>...</h3>...</p>
    uses
    <h3> for its text style while <p>...</p><h3>...</h3>... uses <h3> for
    its
    structuring.

    The problems caused by lack of end-tags are pretty nearly
    insurmountable,
    I think, because multiple un-ended tags can exist, which raises the
    complexity
    of figuring out what the intended structure is.

    Even line breaks in the file may be of little use. The <h3> could easily
    start
    at the beginning of a line.

    To use HTML directly, or even to convert it to XHTML, therefore,
    requires
    a highly-intelligent, A/I-like processing engine that figures out what
    the
    intended structure is likely to be and reformats the code to achieve the

    necessary distinction between structure tags and content tags. Without
    that
    distinction, the "structure" displayed in the browser might be far
    removed
    from the original intent.

    There are other cases where structural controls are used for formatting.

    One way to get indenting, for example is to use <ul><ul>, rather than
    <blockquote>. Rather than implying structure, the <ul>'s in this example

    are format control.

    The question then resolves to an engineering issue: Given that one
    *could*
    spend a great deal of time writing a complex processing engine that made

    reasonable guesses about the intended structure of a document, and given

    that it would take quite a bit of time to do so, does it make more sense
    to
    focus on an XML-based solution, given the growing ubiquity of that
    medium?

    I suspect that the answer is yes. Which makes me start thinking: What
    would an Email-version of DocBook look like? (I'm thinking that
    Xmail would be a good name for such a thing -- but I can't believe that
    no one else is working on that problem!)

    ------------------------------------------------------------------------
    GET A NEXTCARD VISA, in 30 seconds! Get rates
    as low as 0.0% Intro APR and no hidden fees.
    Apply NOW!
    http://click.egroups.com/1/975/2/_/444287/_/952982933/
    ------------------------------------------------------------------------

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:
      http://www.onelist.com/community/unrev-II



    This archive was generated by hypermail 2b29 : Mon Mar 13 2000 - 13:36:08 PST