Re: [unrev-II] TopicMaps, Ted Nelson, Virtual Files, and everything

From: Eugene Eric Kim (
Date: Tue Jun 05 2001 - 13:25:39 PDT

  • Next message: Grant Bowman: "Re: [unrev-II] TopicMaps, Ted Nelson, Virtual Files, and everything"

    On Tue, 5 Jun 2001 wrote:

    > I don't think I'm understanding you here. Which embedded structure are
    > you saying raw text contains? Do you mean punctuation, paragraph
    > breaks, what?

    Yes. We use carriage returns, spaces, tabs, and punctuation in raw text
    in the same way that XML uses tags, entitites, and attributes. When we're
    communicating with humans, we can get away with having an arbitrary
    syntax, because humans are pretty good at figuring out intended semantics.

    For instance, in this e-mail, I distinguish paragraphs by separating them
    with blank lines. If I chose instead to distinguish them by indenting the
    first line of each paragraph, you, the human reader, would have no trouble
    recognizing that the two representations are semantically equivalent.

    So what's the best way of representing content in a document, where a
    document consists of a sequence of paragraphs? Xanadu represents all
    documents as a sequence of bytes at the content layer, but it seems to me
    advantageous to represent it as a sequence of paragraphs.

    Here's a real-world example. CVS differences documents by lines of text.
    So if I have the source code:

        if (x > y) {

    and I change it to:

        if (x > y)

    CVS tells me that these two documents are different. Well, that's true;
    not all lines in this document are the same. But semantically, these two
    excerpts of code are exactly the same. So do you really want your version
    control system saying that these chunks of code are different?

    I'm not sure what the answer is. Intuitively, I think that I'd like my
    version control systems to be smarter, so that if I run some code through
    lint, and I want to do a diff between a pre-lint version of a file and a
    post-lint version, I get something actually useful in return. However, at
    the same time, I don't want to ignore style completely, even if it is
    semantically redundant.

    > I do think that Nelson is on to something when he suggests that
    > structure must be a layer above the content. It's very hard, though,
    > to express exactly why.

    I also think this is valid. But it's clearly futile to completely
    separate content from structure. So the challenge is, how granularly do
    we separate these layers?


    +=== Eugene Eric Kim ===== ===== ===+
    |       "Writer's block is a fancy term made up by whiners so they        |
    +=====  can have an excuse to drink alcohol."  --Steve Martin  ===========+

    Community email addresses: Post message: Subscribe: Unsubscribe: List owner:

    Shortcut URL to this page:

    Your use of Yahoo! Groups is subject to

    This archive was generated by hypermail 2b29 : Tue Jun 05 2001 - 13:47:03 PDT