Okay, I whipped up some perl code to do some very preliminary transcoding
of RFC822 e-mail to the email.dtd schema. It's available at:
Normal disclaimers apply.
Couple of things. For the purposes of bootstrapping, the only really
important function is transcode_body. There are already a bunch of good
e-mail archiving packages that parse RFC822. What we want to do is create
a function that can be easily integrated into these packages so that all
e-mail archiving programs can support our transcoded e-mail. The two
packages I currently have in mind are MHonArc (Perl) and Mailman (Python).
With this in mind, I didn't bother implementing some header parsing code,
specifically dates and miscellaneous headers. I also didn't implement
MIME attachment parsing code. Both of these should be easy enough using
the appropriate CPAN module, but I was in a hurry.
The transcode_body function is very basic. "Paragraphs" are separated by
newlines. Statement IDs (SID) start at 0 and increment by one. I didn't
include any citation parsing code; that's something we can discuss more
today at the meeting and on the list. (It'll also require some clever
interaction with existing archiving programs so we don't have to reinvent
-- +=== Eugene Eric Kim ===== firstname.lastname@example.org ===== http://www.eekim.com/ ===+ | "Writer's block is a fancy term made up by whiners so they | +===== can have an excuse to drink alcohol." --Steve Martin ===========+
This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:57:48 PDT