[ba-ohs-talk] Greenstone information management
http://www.nzdl.org/cgi-bin/library (01)
I posted this URL earlier. Now, I've downloaded the Greenstone software
(for my Wintel box). Had to download and install ActivePerl before
Greenstone would work properly. (02)
What is Greenstone?
GPL. (may not be all that bad because it has a corba interface so we can
call it from non-gpl software). (03)
It's an engine that indexes collections of information. It can handle:
html
word
postscript
pdf
email
other (04)
I first downloaded the base system MG (managing gigabytes) and found that I
couldn't compile it with cygwin.
So, I just installed the entire Greenstone package, which included
everything except a perl engine. (05)
Greenstone is a web-based system. It will run with apache or whatever, but
the download included a server. It found IE 5 and set that as its default
browser. Everything is done from a browser. Installation was quite easy
with the exe file that downloaded, though I must admit that Norton
AntiVirus was very unhappy with it; I had to authorize the install, much to
Norton's chagrin. (06)
What does Greenstone do?
Greenstone
imports/converts files from local/web/ftp to internal html format
indexes html formatted documents
saves internal files compressed
You are able to
create new collections
edit/add to/delete existing collections
manage users, configurations, etc
browse collections
You are also able to
Add new file types by creating perl scripts (07)
Why is Greenstone interesting to an OHS/DKR crowd?
In one sense, it's already a kind of HyperScope. It reads most all kinds
of file types, and if it doesn't, you can fix it so it does. And, it is Web
based. I run mine locally, but if I were on a network, the installer
detects that and makes it Web-enabled. You have the option of declaring
collections private or public. (08)
In another sense (actually, the same sense as above), it's a kind of Grove
engine, because it has adaptors (plugs) that give it the ability (though,
not perfectly -- comment below) to handle most all important file types. (09)
Imperfection exists because of the many different versions of Word file
formats and so forth. Imperfection may also exist as evidenced in the
following:
my installation has sucked up one pdf file and my entire Eudora in.mbx. I
know this because it says it was successful. But, I'm suspicious because I
am unable to browse any information contained in those items. Could be me
(classic newbe); I've just subscribed to the email list. Time will tell. (010)
In any case, given that there is supposed to be a corba interface (I
haven't seen it yet), in theory, we can just allow Greenstone to suck up
entire directories (it doesn't do this automatically, yet), and make them
available to datamining tools in an OHS environment. (011)
Imagine having it suck up D3E documents and the like. (012)
I would like to see as many ohs-talkers as possible download and begin to
experiment with Greenstone. There is certainly some more software to be
developed for it in order to make it useable as a foundation in an OHS
environment. (013)
Cheers
Jack (014)