[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] Greenstone as a HyperScope


Sounds like a winner, Jack. Now that Java 1.4 has regular
expressions, though, I've done my last Perl hacking. I now
have the best of both worlds!    (01)

Jack Park wrote:    (02)

> http://www.greenstone.org/english/home.html
>
> I have mentioned Greenstone before.  The more I play with it, the more I
> tend to think that it is a HyperScope.
> Here is what I know about it from playing with it and reading its
> documentation.
>
> It can suck up entire directories (including sub directories) from your
> hard disk.
> It can suck up entire web sites (including sub directories <I think>).
>
> What it does:
> It reads the file (types include pdf, ps, doc, txt, html, and some gif/jpg
> type files) and converts them to an intermediate file (gml).
> It indexes the gml files.
> It also appears to do n-gram and other statistical stuff.
> It also appears to have some phrase detection tools.
> It says (I haven't seen it yet) it has a corba interface.
>
> If you want to add file types for it to handle, you just write a small perl
> script to do the job and include that script in your "collection"
> configuration file.
>
> Greenstone and all its internal programs are GPL.  With a corba interface,
> we can create a HyperScope interface and just let it do all the internal work.
>
> There is another initiative behind Greenstone, that of doing datamining in
> the Greenstone collections.  That's precisely where I hope it will go soon,
> though Greenstone appears to be linked tightly into some PhD projects,
> meaning it might be several years before it gets the datamining tools out
> for us to play with.
>
> I suspect that Greenstone is a great candidate (I've said this before) for
> a prototype HyperScope infrastructure.  We just need to learn how to use it
> and to extend it.
>
> Cheers
> Jack    (03)