[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: Great? idea for improving this list (was Re: [ba-ohs-talk] Freezope learning environments)


> In what ways are you imaginging this being different from a free
> text index of the mail archive that gets reindexed every time a
> new message comes in?    (01)

If the lexicon and the associated links to each mail for each word could
be extracted from such an indexing system then the answer is none
whatsoever.    (02)

--
Peter    (03)

----- Original Message -----
From: <cdent@burningchrome.com>
To: <ba-ohs-talk@bootstrap.org>
Sent: Friday, April 26, 2002 11:29 PM
Subject: Re: Great? idea for improving this list (was Re: [ba-ohs-talk]
Freezope learning environments)    (04)


>
> [archive_access.practical]
>
> On Fri, 26 Apr 2002, Peter  Jones wrote:
>
> > What I was suggesting was a system that:
> >
> > a) Reads an email and sucks out each word in turn.
> > b) Each new word has a database record created, and the
> > locations of occurrence of the term in another related table.
> > Leaving aside the issue of polysemy for a moment, the
> > record structure would be something like
> > PK_ID, word_string <--relation--> FK_ID, location(s).
> > c) To improve the scanning process, have a subroutine that
> > discards the stop-words chosen, and clean the database of
> > these.
> > d) Repeat for each mail.
> > e) If a word is re-encountered then only the new location for
> > the word is inserted in the database in the appropriate new tuple.
>
> In what ways are you imaginging this being different from a free
> text index of the mail archive that gets reindexed every time a
> new message comes in?
>
> > What you then get is an index for every mail in the archive that
> > contains all the interesting words in all the mails in the archive
and
> > the locations in the mails of all those words.
>
> Is it that the list of words indexed is more limited?
>
> > Sophistication could be added in the read-in phase.
> > For example, polysemy might be attacked by some algorithm that
> > makes guesses about the word type based on a grammar.
> > Locations might be narrowed to paragraphs by chunking them
beforehand.
> > And so on.
>
> You make this sound easy. After watching the list for a while it
> is clear that we don't have the collective time for this measure
> of complexity.  Are we talking about implementing something to
> use now and experiment and develop, or are we talking about an
> ideal eventual system that would work in a variety of capacities?
>
> We can talk the theory (I'd love to) but that stuff has been
> beaten to death here and elsewhere. How do we distinguish between
> the speculative talk and the plans for action?
>
> --
> Chris Dent  <cdent@burningchrome.com>
http://www.burningchrome.com/~cdent/
> "Mediocrities everywhere--now and to come--I absolve you all! Amen!
>  -Salieri, in Peter Shaffer's Amadeus
>
>    (05)