[ba-ohs-talk] Searching vs. Keyword Indexing
At 11:11 AM 4/26/02 +0100, Murray Altheim wrote:
>But a more meta question is: what exactly are the requirements, and
>what are the benefits? Couldn't all this be done server-side on the
>mail list archives, such that if one wanted to browse the archives
>an intelligent search could remove the need for most of the effort?
>I'm still not convinced that people would use it, insofar as it's
>probably almost as much work to figure out an *appropriate* set of
>keywords as it is to type an entire email message. Librarians are
>*experts* at this. I'm not. There's a CMU system I used at NTTC that
>could analyze a text and come up with a set of keywords for it. I'd
>prefer we leave this kind of thing to computers (which are in general
>pretty good at it, especially on longer texts).    (01)
Ok, here is why keyword indexing is better (in some cases) then 
searching.  Take a simple keyword like "software announce" or "new 
software", which I want to use as an indicator that a new piece of software 
is being pointed out.    (02)
Suppose that you know that about a month ago, someone mentioned a piece of 
software that reminds you of something that you just looked at.  You can't 
remember exactly what it was, who mentioned it, or what was the subject of 
the post.    (03)
How can you search for this product?  Should you search for the term 
"software"?  The problem with this is that A. there is certainly going to 
be a lot of noise since software is a pretty frequent term even when not 
discussing a new product.  And B. chances are that "software" would not 
even have been used in the announcement.    (04)
Basically, it is often the case that you are looking for something the name 
of which escapes you.  Tip of the tongue sort of phenomenon.  In this case 
searching is useless.  But not keywords.  If you know that what you saw was 
marked as new, or that it was used for collaboration, then you can check 
out those keyword categories and see a list of all the relevant posts.  I 
am certain that this will narrow down your choices much more then searching 
would.    (05)
==========    (06)
And here is another important point in regard to    (07)
"it's probably almost as much work to figure out an *appropriate* set of 
keywords as it is to type an entire email message. Librarians are *experts* 
at this. I'm not."    (08)
THAT IS THE WHOLE POINT THAT IT'S A LOT OF WORK    (09)
It gets you to think!  You are not just generating noise sending unrelated 
bits of information to the newsgroup, you have to think and decide how what 
you are posting is relevant.  You have to say, oh, I saw something like 
this in the news group before.  That something is similar to this new thing 
because both deal with XXX.  I think that I'll create a keyword called XXX 
so that the next person that comes along can put the item in the same bucket.    (010)
The work that's being done is in defining an ontology.  If we can't put 
keywords on our posts, then we don't know what we are talking about.  You 
can't expect meaning to arise from a semantic analysis, or from a Librarian.    (011)
This is the bootstrap institute here, right?  The Open-Hyperdocument 
System.  We are talking about something that does not yet exist.  We are 
trying to create this system through our discussion of it, and by pointing 
out technologies that seem to be leading up to it.  A librarian would not 
be helpful because a librarian can only put things in existing 
categories.  The categories that we are talking about might not have been 
created yet.    (012)
Think of instant outlining, or the google api.  Until recently there was no 
such concepts as collaborative transcluding outlines or web based search 
engine interfaces.  Maybe these concepts fit into existing categories, and 
maybe they don't, but we can be certain that new stuff is going to come 
along that will challenge any existing ontology.    (013)
Having found an interesting bit of information the work done to categorize 
it is not that great, but the short term and especially the long term 
benefits of doing so are large.  It's kind of like commenting code.  It 
seems unnecessary when it's fresh, but without comments you can not go back 
much later and resume your work, and it's not very usable by others either.    (014)
--Alex    (015)