I've posted a very preliminary DTD for e-mail at:
http://www.eekim.com/ohs/devel/email.dtd
Comments and suggestions are definitely encouraged. I'm going to try to
whip up some code for converting RFC822 e-mail into this XML format before
our meeting today.
There are several issues -- some stylistic, some content-related -- that
arose from developing this:
1. E-mail is more of a container than a document format. The OHS should
be able to support any document type sent or to be sent via e-mail. This
raises the question: How does one specify mixed XML within a document?
I believe this is a question that was raised earlier on this list.
2. I developed a very simple schema for the body text of RFC822 e-mail
that should be useful for transcoding and views, but not much
else. There's basically one tag -- <p> -- which sets XML's xml:space
attribute to "preserve" by default -- the equivalent to HTML's <pre>
tag.
I defined this one tag rather than a multitude of structural tags because,
after analyzing a bunch of e-mails, I realized that any transcoding
algorithm that tries to be too smart -- no matter how sophisticated --
will never be 100 percent right. When it's wrong, you can lose
or misinterpret structural information, a bad thing. On the other hand,
raw ASCII text contains a bunch of structural information usually
specified by whitespace.
I feel that it is better to add some additional structure -- mainly for
addressing and views -- and to retain the textual structure already
there. This is something we can discuss in more detail.
3. Automatically identifying and transcoding citations is going to be
interesting. I think it's okay not to be 100 percent perfect for this
type of thing, but regardless, it will still require some cleverness.
I created two simple links: <citation href> and <a href>. Both should
probably be extended links, but for the purpose of just getting something
usable out there, I made them simple for now.
4. One style issue that arose was whether or not to wrap metadata
information in <header> tags. I opted to do it primarily for readability,
but this is by no means set in stone. I'd be curious to hear people's
opinions on this matter.
5. Looking over the DTD, I realized I left out tags for categorizing
e-mail. I'll add those in the next rev.
-Eugene
-- +=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+ | "Writer's block is a fancy term made up by whiners so they | +===== can have an excuse to drink alcohol." --Steve Martin ===========+
This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:57:48 PDT