[unrev-II] Fwd: Papers on word associations & web growth

From: Jack Park (jackpark@thinkalong.com)
Date: Fri Oct 19 2001 - 08:08:48 PDT

  • Next message: Eric Armstrong: "[unrev-II] Simon's Paper"

    >From: Francis Heylighen <fheyligh@vub.ac.be>
    >
    >A number of interesting papers on the structure of the web and of
    >associative networks in general, how such networks develop, and how they
    >could be used to mimic "intuitive" human understanding. (Thanks to Liane
    >Gabora and Peter Turney for the suggestions!)
    >
    >
    >A Stochastic Model for the Evolution of the Web
    >Mark Levene, Trevor Fenner, George Loizou, Richard Wheeldon
    ><http://www.unifr.ch/cgi-bin/physics/compta/compteur.pl?article=cond-mat/01
    >10016&version=abs>
    >
    >Recently several authors have proposed stochastic models of the growth of
    >the Web graph, which give rise to power-law distributions. These models
    >are based on the notion of preferential attachment leading to the ``rich
    >get richer'' phenomenon. However, these models fail to explain several
    >distributions arising from empirical results, due to the fact that the ...
    >
    >
    >The large-scale structure of semantic networks: statistical analyses and a
    >model for semantic growth
    >Mark Steyvers, Joshua B. Tenenbaum
    ><http://www.unifr.ch/cgi-bin/physics/compta/compteur.pl?article=cond-mat/01
    >10012&version=abs>
    >
    >We present statistical analyses of the large-scale structure of three
    >types of semantic networks: word associations, WordNet, and Roget's
    >thesaurus. We show that they have a small-world structure, characterized
    >by sparse connectivity, short average path-lengths between words, and
    >strong local clustering. In addition, the distributions of the number of ...
    >
    >
    >Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL.
    >Turney, P.D. (2001).
    >http://extractor.iit.nrc.ca/reports/ECML2001.html
    >
    >This paper presents a simple unsupervised learning algorithm for
    >recognizing synonyms, based on statistical data acquired by querying a Web
    >search engine. The algorithm, called PMI-IR, uses Pointwise Mutual
    >Information (PMI) and Information Retrieval (IR) to measure the similarity
    >of pairs of words. PMI-IR is empirically evaluated using 80 synonym test
    >questions from the Test of English as a Foreign Language (TOEFL) and 50
    >synonym test questions from a collection of tests for students of English
    >as a Second Language (ESL). On both tests, the algorithm obtains a score
    >of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which
    >achieves a score of 64% on the same 80 TOEFL questions. The paper
    >discusses potential applications of the new unsupervised learning
    >algorithm and some implications of the results for LSA and LSI (Latent
    >Semantic Indexing).
    >
    >
    >Answering Subcognitive Turing Test Questions: A Reply to French
    >Peter D. Turney
    >http://extractor.iit.nrc.ca/reports/subcognitive.html
    >
    >Robert French has argued that a disembodied computer is incapable of
    >passing a Turing Test that includes subcognitive questions. Subcognitive
    >questions are designed to probe the network of cultural and perceptual
    >associations that humans naturally develop as we live, embodied and
    >embedded in the world. In this paper, I show how it is possible for a
    >disembodied computer to answer subcognitive questions appropriately,
    >contrary to French's claim. My approach to answering subcognitive
    >questions is to use statistical information extracted from a very large
    >collection of text. In particular, I show how it is possible to answer a
    >sample of subcognitive questions taken from French, by issuing queries to
    >a search engine that indexes about 350 million Web pages. This simple
    >algorithm may shed light on the nature of human (sub-) cognition, but the
    >scope of this paper is limited to demonstrating that French is mistaken: a
    >disembodied computer can answer subcognitive questions.
    >
    >
    >Automatic Retrieval and Clustering of Similar Words
    >Dekang Lin
    >http://citeseer.nj.nec.com/lin98automatic.html
    >
    >Abstract
    >Bootstrapping semantics from text is one of the greatest challenges in
    >natural language learning. We first define a word similarity measure based
    >on the distributional pattern of words. The similarity measure allows us
    >to construct a thesaurus using a parsed corpus. We then present a new
    >evaluation methodology for the automatically constructed thesaurus. The
    >evaluation results show that the thesaurus is significantly closer to
    >WordNet than Roget Thesaurus is.
    >--
    >
    >_________________________________________________________________________
    >Dr. Francis Heylighen <fheyligh@vub.ac.be> -- Center "Leo Apostel"
    >Free University of Brussels, Krijgskundestr. 33, 1160 Brussels, Belgium
    >tel +32-2-6442677; fax +32-2-6440744; http://pespmc1.vub.ac.be/HEYL.html

    ------------------------ Yahoo! Groups Sponsor ---------------------~-->
    Get your FREE VeriSign guide to security solutions for your web site: encrypting transactions, securing intranets, and more!
    http://us.click.yahoo.com/UnN2wB/m5_CAA/yigFAA/IHFolB/TM
    ---------------------------------------------------------------------~->

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:
      http://www.onelist.com/community/unrev-II

    Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



    This archive was generated by hypermail 2.0.0 : Fri Oct 19 2001 - 07:59:21 PDT