[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] Keyword Indexing


Alex Shapiro wrote:    (01)

> At 09:59 AM 4/30/02 +0100, Murray Altheim wrote:
[...]
>> If we're trying to develop technologies that are useful outside of
>> this small group, they need to be robust. 
> 
> Are we developing technologies with the purpose that they be useful 
> outside of this group?  Certainly, so more active groups, like this one 
> http://www.info-arch.org/lists/sigia-l/0201/ could benefit from this 
> technology.  However, I think that we should start with the smaller goal 
> of making this list better, and then if the solution works we can think 
> about how to generalize it, and package it so that others could use it.    (02)


If we're designing things at a syntax level right now, there's little
that can be done to "generalize" this if the syntax we use doesn't
satisfy our own requirements. You and I seem to disagree on the
requirements, that's all.    (03)


>>  Can you guarantee that
>> the square-bracketed email keywords will show up in column one of
>> the first non-whitespace line in an email? What about replies that
>> include the keywords of a previous message? What about the keywords
>> after they have been processed into an email archive (HTML) message?
> 
> I can guarantee that when someone sends out an e-mail for the first 
> time, and they want to put in keywords, they will put those keywords as 
> the top line.  The instructions of putting keywords in the top line are 
> pretty simple to follow.    (04)


Yes, and adding "keys:" to that is not significantly more work, and
to those coming to either the list or the list archives will understand
what those square-bracketed thingies are, rather than guessing. Perhaps
we'll start a trend...    (05)


> I see where you are going with questioning the fact that a reply (or the 
> delivered message) may no longer have the keywords as the top line.  One 
> could see this as a problem.  However, my vision of this tools use is 
> that only the initial key-worded posts need to be catalogued.  The rest 
> of the posts in the thread that follows could be accessed by clicking on 
> the rout thread, and then from there to the discussion archived by thread.    (06)


That would be fine, except that any specific message may be the
target of a search/query, or result of a user navigational decision.
Coming into the middle of a thread that has keys at the very first
message would require that in order for either a computer or a
person to know the keys, they'd have to navigate to the beginning
of the thread, which might be inconvenient, difficult, or impossible.
Perhaps the email message isn't *on* the server, the user isn't online,
the message has been saved to hard disk, sent to a friend etc.  The
keys shoudl be associated with each message unless one doesn't care
about the keywords of any except the first message.    (07)

Put it this way: how many times have you actually read a list archive
from the first message in a thread? *Not* from the first message in
a thread?    (08)


>> If adding the five characters "keys:" is really too much to ask
>> to disambiguate keywords, then I don't think there's much chance
>> that this idea will catch on. One simply can't have any square-
>> bracketed content in plaintext be considered keywords without
>> introducing more problems that are solved.
>
> I agree that one simply can't have any square bracketed content be 
> considered keywords, but one can do this for the first line.
> 
>>  The use and reuse of
>> that text will put it into many more contexts than that first non-
>> whitespace line of an email message. If people were more regular in
>> their use, this might be work, but we can hardly advocate something
>> that doesn't survive the introduction of ">>" or markup around it.
>> Remember, we want the keys to survive multiple replies, so that
>> once a thread is started people don't have to retype it, or
>> manipulate quoted text (which is a no-no).
> 
> Right, I address this issue above.  We do not want keywords to survive 
> multiple threads, because threads tend to be subject to topic creep.    (09)


It's the responsibility of any author to start a new thread when there's
a new subject. Whatever we do won't change that.    (010)


> If the response contributes a significant new idea, then the author can    (011)

> go the extra step and repeat the keywords at the top.  Most of the follow 
> up discussion in a thread would not have to be catalogued, however, 
> because it contains information that is expected to be updated with 
> time.  This discussion for instance.  Later on, the back and forth that 
> went on in coming up with a format will not be interesting, only the 
> results will be.    (012)


Huh? How is that? And where would one find the results? The whole
point of a DKR is to be able to go back and document the process
by which decisions are made, or look at the current state of a
discussion. Email threads don't always have a definitive beginning
or end, in terms of subject coherence. Sometimes right in the middle
of a long thread is where the "action" is.    (013)


>> The biggest problem I see with "[keys:" is that it's in English,
>> but I hesitate to suggest a symbolic solution because it's harder
>> to remember (uh, is it "{[" or "[{"?, etc.). I suppose we could
>> come up with an i18n solution... *sigh*
> 
> You mean that the biggest problem is that it's five letters.  (I don't 
> think that french would be better :)    (014)


No, I mean what I wrote. I don't think the number of letters is
remarkable at all, esp. if the word is small. I can understand people
not wanting to type twelve, but five is a short word. I've typed many
more five letter words in this thread than I care to count.    (015)


> ... Well, what about double square brackets? [[ ... ]] That's not too 
> hard to remember, and they don't occur as frequently as single 
> brackets.  I still say that if we only consider the first line of the 
> post as a valid location for keywords, then single square brackets are fine.    (016)


As I wrote quite earlier on, the presence of "]]" and ">" anywhere
near each other in XML is a real problem, as the combination is a
CDATA end marker, so XML-based processors would choke if they ran
upon that. It's something easy to avoid, so I'd say avoid it. And
I find    (017)

    [keys: dogs, bones]    (018)

a lot more aesthetically appealing, less prone to mistyping, and more
clear as to purpose than    (019)

    [[dogs, bones]]    (020)

But I've also written plenty on this now, more than is useful.    (021)

Murray    (022)

......................................................................
Murray Altheim                  <http://kmi.open.ac.uk/people/murray/>
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK    (023)

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu    (024)