At 11:25 AM 4/15/2001 -0700, you wrote:
><Jack Park>
>My plan for the Nexist prototype is this: everything that comes into the
>system is first scanned by an engine that breaks out all the words and
>word phrases. Most of those may already exist, but the system keeps track
>of their "occurrences" and those are later grist for a clustering mill
>that makes contributions to the knowledge base itself -- theoretically
>speaking, automatically. Users have an opportunity to "subscribe" to
>favored concepts and thus find a notification of changes or additions
>waiting for them at their PIM (personal info manager) page. With that,
>theoretically speaking once again, there is no absolute need to read
>everything at once.
>
>[snip].
></Jack Park>
>
>
>
>Neil Larson has a program called Thsar for faceted thesaurus that does
>this rather nicely.
>
>
>
>Given a text, supply a list of noise or stop words, and then extract all
>sequences of words that are non-noise as phrases. Admittedly that misses
>some perfectly valid phrases, but the algorithm is simple.
>
>
>
>The program then allows the list to be merged with an existing list and
>relationships amongst the terms to be specified.
>
>
>
>The full set of utilities that come with the program is beyond addressing
>in an email, but he essentially assists in building taxonomies and
>standard vocabulary usage throughout a hypertext system. There are some
>very good ideas there, IMO.
>
>
>
>Thanks,
>
>
>
>Garold (Gary) L. Johnson
Useful information, but, from my efforts with Google, the program no longer
exists -- that is to say, its web site does not open.
Perhaps it's time to develop a thread on knowledge extraction from text...
Cheers
Jack
This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 15:14:00 PDT