Re: [ba-ohs-talk] bootstrap list message content & purple numbers
OK, (01)
Here's my initial stab at that. I think we have enough data gathered from
the messages themselves, and the backlink data, to enable us to create a
first attempt at an OHS that will solve this purple numbering issue in many
(most?) cases. (And we can perfect this as we go.) (02)
Stage One:
Put all messages that don't contain quoted passages into a database.
Messages are split into their constituent paragraphs. The tables relate a
message ID to a set of paragraph #s in tuples with the text they refer to.
The paragraph #s (whatever we decide they should be) may go to form purple
numbering (in whatever form that takes in the end), or maybe purple #s
should be kept separate and act as Lee Iverson suggests.
message ID <-> para #s
para# <-> text <-> purple #. (03)
Stage Two:
Using the message metadata work out the chains of messages that follow on
from a particular unquoted message and contain only one level of quoting.
Split these into the paragraphs that would be entered into database as per
Stage One, but then test quoted passages against data in the DB for the root
message, and if there's a match, insert a reference to the chunk from the
original message in the DB at the appropriate para #. (04)
Stage Three:
Repeat Stage Two for messages that are further down the quoting chain using
the DB data again to track the quotings. (05)
This would give paragraphs uniqueness within the system. (06)
What think ye?
--
Cheers
Peter (07)
----- Original Message -----
From: "Eugene Eric Kim" <eekim@eekim.com>
To: <ba-ohs-talk@bootstrap.org>
Sent: Monday, December 17, 2001 9:13 AM
Subject: Re: [ba-ohs-talk] bootstrap list message content & purple numbers (08)
> On Fri, 14 Dec 2001, Peter Jones wrote:
>
> > The posts that Jack Park and others make that are forwarded from other
lists
> > tend to appear as quoted excerpts with perhaps a small comment from the
> > forwarder. However, the bulk of any following discussion might often
revolve
> > around quoting from the forwarded message (which tends to appear as
quoted).
>
> Okay, so perhaps this is justification to refine the algorithm for adding
> purple numbers to quoted text. The question then is, what's the right way
> to do this? The simplest thing to do would be to do what you suggested,
> which was to exclude ">" when processing. But, is that really the "right"
> way to do it from an OHS context?
>
> From an OHS context, quoted text is, or should be, a transclusion. So the
> purple numbers in the quoted text should be the purple numbers from the
> original text.
>
> The problem is, purple numbers are only unique within the context of the
> current document. IDs are reused all the time. For example, all of the
> e-mails archived from this list have an NID of "01". So if you
> transcluded the first paragraph of any e-mail, you would have ID-clash.
>
> No problem. Just make purple numbers unique across the entire site as
> opposed to individual documents, right? Well, sure, that would solve the
> problem. But it would also introduce a new usability problem. Do you
> really want to have visible purple numbers that read like "0245-af842-12"
> after every node? This raises a more general, philosophical question: How
> human readable should node IDs be?
>
> This is not some imaginary problem; the BA web site is faced with it right
> now. We want to build a news system, where the news can appear on any
> number of pages -- the home page, a news archive page, sites syndicating
> BA news, etc. All news items should have purple numbers. The question
> is, should those purple numbers be immutable? In other words, should a
> news item always have the same purple numbers attached to it irregardless
> of the context, or is it acceptable for the purple numbers of the news
> item to vary depending on the page from which it is viewed?
>
> If the world had a real OHS, this wouldn't be a problem. But for now,
> we're trying to add OHS-like features on top of the Web, and so we're
> faced with issues like these.
>
> -Eugene
>
> --
> +=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/
===+
> | "Writer's block is a fancy term made up by whiners so they
|
> +===== can have an excuse to drink alcohol." --Steve Martin
===========+
>
> (09)