[unrev-II] Fwd: Papers on word associations & web growth

From: Jack Park (jackpark@thinkalong.com)
Date: Fri Oct 19 2001 - 08:08:48 PDT

  • Next message: Eric Armstrong: "[unrev-II] Simon's Paper"

    >From: Francis Heylighen <fheyligh@vub.ac.be>
    >A number of interesting papers on the structure of the web and of
    >associative networks in general, how such networks develop, and how they
    >could be used to mimic "intuitive" human understanding. (Thanks to Liane
    >Gabora and Peter Turney for the suggestions!)
    >A Stochastic Model for the Evolution of the Web
    >Mark Levene, Trevor Fenner, George Loizou, Richard Wheeldon
    >Recently several authors have proposed stochastic models of the growth of
    >the Web graph, which give rise to power-law distributions. These models
    >are based on the notion of preferential attachment leading to the ``rich
    >get richer'' phenomenon. However, these models fail to explain several
    >distributions arising from empirical results, due to the fact that the ...
    >The large-scale structure of semantic networks: statistical analyses and a
    >model for semantic growth
    >Mark Steyvers, Joshua B. Tenenbaum
    >We present statistical analyses of the large-scale structure of three
    >types of semantic networks: word associations, WordNet, and Roget's
    >thesaurus. We show that they have a small-world structure, characterized
    >by sparse connectivity, short average path-lengths between words, and
    >strong local clustering. In addition, the distributions of the number of ...
    >Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL.
    >Turney, P.D. (2001).
    >This paper presents a simple unsupervised learning algorithm for
    >recognizing synonyms, based on statistical data acquired by querying a Web
    >search engine. The algorithm, called PMI-IR, uses Pointwise Mutual
    >Information (PMI) and Information Retrieval (IR) to measure the similarity
    >of pairs of words. PMI-IR is empirically evaluated using 80 synonym test
    >questions from the Test of English as a Foreign Language (TOEFL) and 50
    >synonym test questions from a collection of tests for students of English
    >as a Second Language (ESL). On both tests, the algorithm obtains a score
    >of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which
    >achieves a score of 64% on the same 80 TOEFL questions. The paper
    >discusses potential applications of the new unsupervised learning
    >algorithm and some implications of the results for LSA and LSI (Latent
    >Semantic Indexing).
    >Answering Subcognitive Turing Test Questions: A Reply to French
    >Peter D. Turney
    >Robert French has argued that a disembodied computer is incapable of
    >passing a Turing Test that includes subcognitive questions. Subcognitive
    >questions are designed to probe the network of cultural and perceptual
    >associations that humans naturally develop as we live, embodied and
    >embedded in the world. In this paper, I show how it is possible for a
    >disembodied computer to answer subcognitive questions appropriately,
    >contrary to French's claim. My approach to answering subcognitive
    >questions is to use statistical information extracted from a very large
    >collection of text. In particular, I show how it is possible to answer a
    >sample of subcognitive questions taken from French, by issuing queries to
    >a search engine that indexes about 350 million Web pages. This simple
    >algorithm may shed light on the nature of human (sub-) cognition, but the
    >scope of this paper is limited to demonstrating that French is mistaken: a
    >disembodied computer can answer subcognitive questions.
    >Automatic Retrieval and Clustering of Similar Words
    >Dekang Lin
    >Bootstrapping semantics from text is one of the greatest challenges in
    >natural language learning. We first define a word similarity measure based
    >on the distributional pattern of words. The similarity measure allows us
    >to construct a thesaurus using a parsed corpus. We then present a new
    >evaluation methodology for the automatically constructed thesaurus. The
    >evaluation results show that the thesaurus is significantly closer to
    >WordNet than Roget Thesaurus is.
    >Dr. Francis Heylighen <fheyligh@vub.ac.be> -- Center "Leo Apostel"
    >Free University of Brussels, Krijgskundestr. 33, 1160 Brussels, Belgium
    >tel +32-2-6442677; fax +32-2-6440744; http://pespmc1.vub.ac.be/HEYL.html

    ------------------------ Yahoo! Groups Sponsor ---------------------~-->
    Get your FREE VeriSign guide to security solutions for your web site: encrypting transactions, securing intranets, and more!

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:

    Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

    This archive was generated by hypermail 2.0.0 : Fri Oct 19 2001 - 07:59:21 PDT