[unrev-II] Fwd: Papers on word associations & web growth

From: Jack Park (jackpark@thinkalong.com)
Date: Fri Oct 19 2001 - 08:08:48 PDT

Next message: Eric Armstrong: "[unrev-II] Simon's Paper"

Previous message: Henry K van Eyken: "Re: [unrev-II] Microsoft, the Gatekeeper"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>From: Francis Heylighen <fheyligh@vub.ac.be>
>
>A number of interesting papers on the structure of the web and of
>associative networks in general, how such networks develop, and how they
>could be used to mimic "intuitive" human understanding. (Thanks to Liane
>Gabora and Peter Turney for the suggestions!)
>
>
>A Stochastic Model for the Evolution of the Web
>Mark Levene, Trevor Fenner, George Loizou, Richard Wheeldon
><http://www.unifr.ch/cgi-bin/physics/compta/compteur.pl?article=cond-mat/01
>10016&version=abs>
>
>Recently several authors have proposed stochastic models of the growth of
>the Web graph, which give rise to power-law distributions. These models
>are based on the notion of preferential attachment leading to the ``rich
>get richer'' phenomenon. However, these models fail to explain several
>distributions arising from empirical results, due to the fact that the ...
>
>
>The large-scale structure of semantic networks: statistical analyses and a
>model for semantic growth
>Mark Steyvers, Joshua B. Tenenbaum
><http://www.unifr.ch/cgi-bin/physics/compta/compteur.pl?article=cond-mat/01
>10012&version=abs>
>
>We present statistical analyses of the large-scale structure of three
>types of semantic networks: word associations, WordNet, and Roget's
>thesaurus. We show that they have a small-world structure, characterized
>by sparse connectivity, short average path-lengths between words, and
>strong local clustering. In addition, the distributions of the number of ...
>
>
>Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL.
>Turney, P.D. (2001).
>http://extractor.iit.nrc.ca/reports/ECML2001.html
>
>This paper presents a simple unsupervised learning algorithm for
>recognizing synonyms, based on statistical data acquired by querying a Web
>search engine. The algorithm, called PMI-IR, uses Pointwise Mutual
>Information (PMI) and Information Retrieval (IR) to measure the similarity
>of pairs of words. PMI-IR is empirically evaluated using 80 synonym test
>questions from the Test of English as a Foreign Language (TOEFL) and 50
>synonym test questions from a collection of tests for students of English
>as a Second Language (ESL). On both tests, the algorithm obtains a score
>of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which
>achieves a score of 64% on the same 80 TOEFL questions. The paper
>discusses potential applications of the new unsupervised learning
>algorithm and some implications of the results for LSA and LSI (Latent
>Semantic Indexing).
>
>
>Answering Subcognitive Turing Test Questions: A Reply to French
>Peter D. Turney
>http://extractor.iit.nrc.ca/reports/subcognitive.html
>
>Robert French has argued that a disembodied computer is incapable of
>passing a Turing Test that includes subcognitive questions. Subcognitive
>questions are designed to probe the network of cultural and perceptual
>associations that humans naturally develop as we live, embodied and
>embedded in the world. In this paper, I show how it is possible for a
>disembodied computer to answer subcognitive questions appropriately,
>contrary to French's claim. My approach to answering subcognitive
>questions is to use statistical information extracted from a very large
>collection of text. In particular, I show how it is possible to answer a
>sample of subcognitive questions taken from French, by issuing queries to
>a search engine that indexes about 350 million Web pages. This simple
>algorithm may shed light on the nature of human (sub-) cognition, but the
>scope of this paper is limited to demonstrating that French is mistaken: a
>disembodied computer can answer subcognitive questions.
>
>
>Automatic Retrieval and Clustering of Similar Words
>Dekang Lin
>http://citeseer.nj.nec.com/lin98automatic.html
>
>Abstract
>Bootstrapping semantics from text is one of the greatest challenges in
>natural language learning. We first define a word similarity measure based
>on the distributional pattern of words. The similarity measure allows us
>to construct a thesaurus using a parsed corpus. We then present a new
>evaluation methodology for the automatically constructed thesaurus. The
>evaluation results show that the thesaurus is significantly closer to
>WordNet than Roget Thesaurus is.
>--
>
>_________________________________________________________________________
>Dr. Francis Heylighen <fheyligh@vub.ac.be> -- Center "Leo Apostel"
>Free University of Brussels, Krijgskundestr. 33, 1160 Brussels, Belgium
>tel +32-2-6442677; fax +32-2-6440744; http://pespmc1.vub.ac.be/HEYL.html

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Get your FREE VeriSign guide to security solutions for your web site: encrypting transactions, securing intranets, and more!
http://us.click.yahoo.com/UnN2wB/m5_CAA/yigFAA/IHFolB/TM
---------------------------------------------------------------------~->

Community email addresses:
  Post message: unrev-II@onelist.com
  Subscribe: unrev-II-subscribe@onelist.com
  Unsubscribe: unrev-II-unsubscribe@onelist.com
  List owner: unrev-II-owner@onelist.com

Shortcut URL to this page:
http://www.onelist.com/community/unrev-II

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Next message: Eric Armstrong: "[unrev-II] Simon's Paper"
Previous message: Henry K van Eyken: "Re: [unrev-II] Microsoft, the Gatekeeper"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.0.0 : Fri Oct 19 2001 - 07:59:21 PDT