[ba-ohs-talk] BA archive index
Peter, (01)
Glad to see progress on important work reported in your letter on
020429. (02)
Rod (03)
********** (04)
Peter Jones wrote:
>
> FWIW,
>
> I've just passed my first working prototype indexer over the
> November 2001 ba-ohs-talk archive.
> It built a 1.3 MB XML file from those 33 mails alone.
> And that's with my shrinking the amount of text in there a
> fair bit. Although I could squeeze it a little more perhaps.
>
> The results are in a (240KB) zip file at
> http://www.concept67.fsnet.co.uk/xml/index.htm
>
> Apart from the preponderous presence of Eugene's sig,
> which seems to have survived all my attempts to purge it
> thus far, and the odd piece of message header data, the results are
> pretty clean (I hope you'll agree).
>
> Mysteries to ponder:
> Somehow my little perl script managed to index Eugene's sig under
> 'coma'. :-) Maybe my code is smarter than I thought.
>
> Cheers,
> --
> Peter
>
> ----- Original Message -----
> From: "Peter Jones" <ppj@concept67.fsnet.co.uk>
> To: <ba-ohs-talk@bootstrap.org>
> Sent: Sunday, April 28, 2002 1:12 PM
> Subject: BA Archive Index WAS: Re: Great? idea for improving this list
> (was Re: [ba-ohs-talk] Freezope learning environments)
>
> > Hi,
> >
> > Determining sequences of non-noise words might be a bit beyond me as
> > yet.
> > But it also might not be necessary in this application (see below).
> > It depends where things go and how far.
> >
> > So far my prototype produces bulky data sets like this from the test
> > data (a very small sample):
> > <?xml version="1.0"?>
> > <index>
> > <word_data>
> > <word>Bringing</word>
> > <urloc>http://this.that.other/#nid03</urloc>
> > <para>Bringing Paths back in...
> > [snipped a load of data]
> > </para>
> > </word_data>
> > <word_data>
> > <word>Cheers</word>
> > <urloc>http://this.that.other/#nid05</urloc>
> > <para>Cheers,
> > Peter </para>
> > </word_data>
> > <word_data>
> > <word>Cursor</word>
> > <urloc>http://this.that.other/#nid02</urloc>
> > <para>I had to read it about 10 times, but I think I'm getting
> > there.
> > [snipped a load of data]
> > create a new Cursor if the permissions match. </para>
> > <urloc>http://this.that.other/#nid02</urloc>
> > <para>I had to read it about 10 times, but I think I'm getting
> > there.
> > [snipped a load of data]
> > Then the Cursor object walks the Node graph and only 'picks up' a node
> > to
> > create a new Cursor if the permissions match. </para>
> > </word_data>
> > ...[snipped lots more data]
> > </index>
> >
> > I will try to implement a ring buffer to cut the amount of paragraph
> > data
> > per word down to a couple of lines.
> > (Transclusion into the web page would improve matters, but I'll think
> > about that later.)
> >
> > The idea is that you look up a word on the webpage/graph/whatever,
> > and listed below the word is a set of links and a snippet of text as
> a
> > hint,
> > so that the user can determine the
> > context (and hence the correct meaning, I hope) of the word, for each
> > link.
> > Clicking on the link takes you to the appropriate mail in BA archive,
> > and from
> > there you can track threads.
> > It might be good to have an outline of the thread next to the URL for
> > the index
> > entry though(?).
> >
> > There's also a lot of cruft relating to the message data inserted by
> the
> > reply-to mechanism
> > that I need to exclude.
> > It also might be an idea to exclude folks' names.
> >
> > --
> > Peter
> >
> >
> > ----- Original Message -----
> > From: "Garold (Gary) L. Johnson" <dynalt@dynalt.com>
> > To: <ba-ohs-talk@bootstrap.org>
> > Sent: Saturday, April 27, 2002 11:42 PM
> > Subject: RE: Great? idea for improving this list (was Re:
> [ba-ohs-talk]
> > Freezope learning environments)
> >
> >
> > > FWIW,
> > >
> > > By taking sequences of non-noise words to some number, you begin to
> > build a
> > > phrase dictionary as well as a word dictionary, and that can prove
> > more
> > > useful.
> > >
> > > Give this phrase list and the tools to establish relationships
> within
> > the
> > > list, it is possible to develop a faceted thesaurus, which is not
> too
> > far
> > > distant from a topic map.
> > > One of Neil Larson's DOS programs does this and he used it to
> organize
> > huge
> > > hypertext systems.
> > >
> > > Single words are a start, but this extension should be easy to add
> > (more so
> > > that the initial work).
> > >
> > > Thanks,
> > >
> > > Garold (Gary) L. Johnson
> > >
> >
> >
> > (05)