[ba-ohs-talk] Keyword Indexing
Regarding:
http://www.bootstrap.org/lists/ba-ohs-talk/0204/msg00126.html
http://www.bootstrap.org/lists/ba-ohs-talk/0204/msg00132.html
http://www.bootstrap.org/lists/ba-ohs-talk/0204/msg00176.html
http://www.bootstrap.org/lists/ba-ohs-talk/0204/msg00194.html (01)
I strongly agree with the above messages, especially Chris Dent's last
message which I was going to quote. However, I found myself saying
Exactly, Exactly, Exactly to all the lines :) so I guess that I'll just
spare the clutter. (02)
*1* KWD LOCATION: I think that we are in agreement that the keywords should
be the first line of the email. (03)
*2* KWD ENVELOPE: I was tempted to agree with Murray's suggestion that the
envelope for the keywords should be more complex then [], because of the
argument that some emails might be html based. However, I think that
everyone on this list mostly uses text, and I see that one of my messages
which had some bold text (which I assume needs html), came out looking like
plain text in the ba-ohs archives. So, as long as parsing is not a
problem, which it looks like it won't be, I say that using square brackets
around the keywords is fine. (04)
*3* KWD FORMAT: We need to agree on some sort of word separation standard
for keywords. The above thread has contained the following formats:
FooBar, Foo_Bar, Foo-Bar. I don't have much of a preference, but I think
that either the first or the second is better. The undescored version
Foo_Bar seemes to be the most readable. (05)
*4* KWD SELECTION: In message #126 Eric suggests that we take the time to
come up with a list of keywords. We could do this, but first I think that
we might experiment with comming up with a minimal set of basic keywords,
and then having every new keyword automatically added to the DB. (06)
*4.1* BASIC KEYWORDS: I think that as a group we should come up with a
minimal set of three or four keywords that would give a general type to the
message. For instance, I am thinking that messages which announce a new
type of software should be given the Software_Announce, or SA, (or some
variation) keyword. The thing is that about 1/3 of the root messages (not
the followups) posted to this group announce software, and it would be nice
to be able to filter those out from the general discussion. Other basic
keywords might include Document_Announce, Conference_Announce,
Fun_Announce, and Seeking_Software. (07)
In theory there should be one basic keyword per message. The purpose of
these keywords is to provide an intermediate level of specification between
the current thread structure, and the fine grain keyboarding to be
discussed next. I envison using archives of messages aggregated by these
keywords to queries of "I just saw some cool software mentioned recently,
but I don't remember what it was". (08)
*4.2* FINE GRAINED KEYWORDS: Besides basic keywords there can also be fine
grained keywords, such as IBIS, Google, Graphs, etc. My suggestion is that
instead of wasting time arguing about these, we allow any user to use any
keyword. New keywords will automatically be added to a database. (09)
The idea is that we should eventually settle on some common keywords by
convention. I am sure that there will be some tension here, but I think
that spreading the tension out over the first few weeks of use is better
then arguing about this stuff before we are even completely sure what we
are working with. It is simpler to see how to categorize new messages,
then to have long arguments about how we should have categorized older
messages. (010)
An idea that I had for keywords is that they could be placed in
hierarchies, for instance Google.API. A message tagged with the Google.API
could be seen by viewing both "Google" and "Google.API", but not the other
way around. This way, if you chose a subcategory which other users did not
agree on, your message would still be captured by the parent category. (011)
Multiple keywords should work in the same way. A message tagged with
multiple keywords would be viewable by looking at any one of the listings. (012)
*5* INTERFACE SUGGESTIONS: I like Mark's suggestion of enriching the
messages, and then delivering them to our inboxes. If we do this, then I
suggest converting the keywords into hyperlinks which would again appear at
the beginning of the email. Clicking on the hyperlinks would take you to
the chronological list of messages tagged with that keyword (the listings
which I mentioned above). (013)
We could also use the keywords to enrich the current
http://www.bootstrap.org/lists/ba-ohs-talk/0204/threads.html
archive. Currently all you see is an indented list with the post hyperlink
and perhaps one more bit of information, the author name for the thread
view. I would suggest putting the keyword links adjacent to the post
hyperlink. (014)
Also, perhaps we could add a Keyword view to the author/thread/date
views. There are many ways that this could look, but one is something like
the author view, but with keywords instead of authors. (015)
A problem with this approach is that data would not be accumulated month to
month, as it currently isN'T :) I don't know what it would take to get
away from this, maybe setting periods for each keyword and breaking the
data down that way. I am not even sure if the breakdown is necessary. My
guess is that it's only there now for practical purposes, and not out of
necessity. (016)
========== (017)
Ok? So the actionable items are to come up with a list of basic keywords,
decide on the multi-word format, and figure out how to implement this type
of system. I think that from a technical standpoint this problem is not
too bad at all. And it would be a big help managing the 200 or so messages
that we receive each month, who besides Rod could remember all that :) (018)
--Alex (019)