[unrev-II] Extracting words and phrases

From: Garold L. Johnson (dynalt@dynalt.com)
Date: Sun Apr 15 2001 - 11:25:10 PDT

  • Next message: Garold L. Johnson: "RE: [unrev-II] Collaborative Discussion Tools"

    <Jack Park>
    My plan for the Nexist prototype is this: everything that comes into the
    system is first scanned by an engine that breaks out all the words and word
    phrases. Most of those may already exist, but the system keeps track of
    their "occurrences" and those are later grist for a clustering mill that
    makes contributions to the knowledge base itself -- theoretically speaking,
    automatically. Users have an opportunity to "subscribe" to favored concepts
    and thus find a notification of changes or additions waiting for them at
    their PIM (personal info manager) page. With that, theoretically speaking
    once again, there is no absolute need to read everything at once.

    [snip].
    </Jack Park>

    Neil Larson has a program called Thsar for ‘faceted thesaurus’ that does
    this rather nicely.

    Given a text, supply a list of noise or stop words, and then extract all
    sequences of words that are non-noise as phrases. Admittedly that misses
    some perfectly valid phrases, but the algorithm is simple.

    The program then allows the list to be merged with an existing list and
    relationships amongst the terms to be specified.

    The full set of utilities that come with the program is beyond addressing in
    an email, but he essentially assists in building taxonomies and standard
    vocabulary usage throughout a hypertext system. There are some very good
    ideas there, IMO.

    Thanks,

    Garold (Gary) L. Johnson
    DYNAMIC Alternatives <http://www.dynalt.com/>



    This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 11:37:59 PDT