Re: [unrev-II] Extracting words and phrases

From: Jack Park (jackpark@thinkalong.com)
Date: Sun Apr 15 2001 - 15:01:16 PDT

  • Next message: Jack Park: "RE: [unrev-II] Collaborative Discussion Tools"

    At 11:25 AM 4/15/2001 -0700, you wrote:

    ><Jack Park>
    >My plan for the Nexist prototype is this: everything that comes into the
    >system is first scanned by an engine that breaks out all the words and
    >word phrases. Most of those may already exist, but the system keeps track
    >of their "occurrences" and those are later grist for a clustering mill
    >that makes contributions to the knowledge base itself -- theoretically
    >speaking, automatically. Users have an opportunity to "subscribe" to
    >favored concepts and thus find a notification of changes or additions
    >waiting for them at their PIM (personal info manager) page. With that,
    >theoretically speaking once again, there is no absolute need to read
    >everything at once.
    >
    >[snip].
    ></Jack Park>
    >
    >
    >
    >Neil Larson has a program called Thsar for faceted thesaurus that does
    >this rather nicely.
    >
    >
    >
    >Given a text, supply a list of noise or stop words, and then extract all
    >sequences of words that are non-noise as phrases. Admittedly that misses
    >some perfectly valid phrases, but the algorithm is simple.
    >
    >
    >
    >The program then allows the list to be merged with an existing list and
    >relationships amongst the terms to be specified.
    >
    >
    >
    >The full set of utilities that come with the program is beyond addressing
    >in an email, but he essentially assists in building taxonomies and
    >standard vocabulary usage throughout a hypertext system. There are some
    >very good ideas there, IMO.
    >
    >
    >
    >Thanks,
    >
    >
    >
    >Garold (Gary) L. Johnson

    Useful information, but, from my efforts with Google, the program no longer
    exists -- that is to say, its web site does not open.
    Perhaps it's time to develop a thread on knowledge extraction from text...

    Cheers
    Jack



    This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 15:14:00 PDT