My plan for the Nexist prototype is this: everything that comes into the
system is first scanned by an engine that breaks out all the words and word
phrases. Most of those may already exist, but the system keeps track of
their "occurrences" and those are later grist for a clustering mill that
makes contributions to the knowledge base itself -- theoretically speaking,
automatically. Users have an opportunity to "subscribe" to favored concepts
and thus find a notification of changes or additions waiting for them at
their PIM (personal info manager) page. With that, theoretically speaking
once again, there is no absolute need to read everything at once.
Neil Larson has a program called Thsar for ‘faceted thesaurus’ that does
this rather nicely.
Given a text, supply a list of noise or stop words, and then extract all
sequences of words that are non-noise as phrases. Admittedly that misses
some perfectly valid phrases, but the algorithm is simple.
The program then allows the list to be merged with an existing list and
relationships amongst the terms to be specified.
The full set of utilities that come with the program is beyond addressing in
an email, but he essentially assists in building taxonomies and standard
vocabulary usage throughout a hypertext system. There are some very good
ideas there, IMO.
Garold (Gary) L. Johnson
DYNAMIC Alternatives <http://www.dynalt.com/>
This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 11:37:59 PDT