Re: [ba-ohs-talk] Keyword Indexing
At 09:59 AM 4/30/02 +0100, Murray Altheim wrote:
>Alex Shapiro wrote:
>>Ok, Murray, come on now. I read the email spec, and I see that
>>separating headers from the email body might not be so trivial as to be
>>accomplishable by using grep, (you can grep for the first occurrence of a
>>double new line, but then you have to look at the Next line), but why
>>should we let that stop us.
>>So the parsing program is going to have to be a bit longer, but so
>>what. I think that the small bit of effort that it's going to take to
>>write a longer parser far outweighs the collective nuisance of having to
>>type "keys:" before every keyword section.
>>If the first line of a post with a "[" then I think that we can just
>>assume that it's the start of a keyword section. (A check for a closing
>>bracket would make this even more certain) I do not recall any emails
>>that have started with an open square bracket. Maybe there will be a
>>slight bit of noise some unusual posts, but I think that we can manually
>>delete any keywords generated in this way.
>
>
>If we're trying to develop technologies that are useful outside of
>this small group, they need to be robust. (01)
Are we developing technologies with the purpose that they be useful outside
of this group? Certainly, so more active groups, like this one
http://www.info-arch.org/lists/sigia-l/0201/ could benefit from this
technology. However, I think that we should start with the smaller goal of
making this list better, and then if the solution works we can think about
how to generalize it, and package it so that others could use it. (02)
> Can you guarantee that
>the square-bracketed email keywords will show up in column one of
>the first non-whitespace line in an email? What about replies that
>include the keywords of a previous message? What about the keywords
>after they have been processed into an email archive (HTML) message? (03)
I can guarantee that when someone sends out an e-mail for the first time,
and they want to put in keywords, they will put those keywords as the top
line. The instructions of putting keywords in the top line are pretty
simple to follow. (04)
I see where you are going with questioning the fact that a reply (or the
delivered message) may no longer have the keywords as the top line. One
could see this as a problem. However, my vision of this tools use is that
only the initial key-worded posts need to be catalogued. The rest of the
posts in the thread that follows could be accessed by clicking on the rout
thread, and then from there to the discussion archived by thread. (05)
>If adding the five characters "keys:" is really too much to ask
>to disambiguate keywords, then I don't think there's much chance
>that this idea will catch on. One simply can't have any square-
>bracketed content in plaintext be considered keywords without
>introducing more problems that are solved. (06)
I agree that one simply can't have any square bracketed content be
considered keywords, but one can do this for the first line. (07)
> The use and reuse of
>that text will put it into many more contexts than that first non-
>whitespace line of an email message. If people were more regular in
>their use, this might be work, but we can hardly advocate something
>that doesn't survive the introduction of ">>" or markup around it.
>Remember, we want the keys to survive multiple replies, so that
>once a thread is started people don't have to retype it, or
>manipulate quoted text (which is a no-no). (08)
Right, I address this issue above. We do not want keywords to survive
multiple threads, because threads tend to be subject to topic creep. If
the response contributes a significant new idea, then the author can go the
extra step and repeat the keywords at the top. Most of the follow up
discussion in a thread would not have to be catalogued, however, because it
contains information that is expected to be updated with time. This
discussion for instance. Later on, the back and forth that went on in
coming up with a format will not be interesting, only the results will be. (09)
>The biggest problem I see with "[keys:" is that it's in English,
>but I hesitate to suggest a symbolic solution because it's harder
>to remember (uh, is it "{[" or "[{"?, etc.). I suppose we could
>come up with an i18n solution... *sigh* (010)
You mean that the biggest problem is that it's five letters. (I don't
think that french would be better :) (011)
... Well, what about double square brackets? [[ ... ]] That's not too hard
to remember, and they don't occur as frequently as single brackets. I
still say that if we only consider the first line of the post as a valid
location for keywords, then single square brackets are fine. (012)
Cheers,
--Alex (013)
>Murray
>
>......................................................................
>Murray Altheim <http://kmi.open.ac.uk/people/murray/>
>Knowledge Media Institute
>The Open University, Milton Keynes, Bucks, MK7 6AA, UK
>
> In the evening
> The rice leaves in the garden
> Rustle in the autumn wind
> That blows through my reed hut. -- Minamoto no Tsunenobu (014)