Re: [ba-ohs-talk] SUN's Conceptual Indexing Project for Precision Content Retrieval

This is worth a read
http://research.sun.com/research/knowledge/technology.html    (01)

"Key Ideas
behind the
"Making a difference  We have found that techniques from knowledge
representation and natural language processing can make a useful
contribution to solving the paraphrase problem. By searching a
structured conceptual taxonomy of the words and phrases extracted from a
collection of documents, our algorithms can effectively connect terms in
a query with appropriate related terms in document passages.    (02)

"The problem with synonyms
  A common approach to the paraphrase problem is to use tables of
synonyms to automatically expand queries by adding terms that are
recorded as "synonymous." However, there are few real synonyms in
English, so the common practice is to include related words as if they
were synonyms. However, treating terms this way when they are not really
synonyms introduces a level of granularity that trades off precision for
recall. There is no a priori correct level for this tradeoff - different
information needs require different levels of generality - so this
technique often degrades retrieval rather than improving it.
As an alternative to synonym classes, we use taxonomic subsumption
algorithms that exploit generality (subsumption) rather than synonymy to
connect terms in queries with passages that contain more specific terms
as well as the requested terms. These algorithms do not automatically
explore more general terms, so the level of generality is controlled by
your choice of query terms. For example, if you ask for "motor vehicles"
you would get trucks, buses, cars, etc., but if you ask for
"automobiles" you would get cars and taxicabs, but not trucks and buses.    (03)

  Using knowledge bases of general semantic facts, structured conceptual
taxonomies (a type of semantic network) can be constructed from words
and phrases. These words and phrases can be extracted automatically from
text and parsed into conceptual structures. The taxonomy can be
organized by the most-specific-subsumer (MSS) relationship, where each
concept is linked to the most specific concepts that subsume it - i.e.,
that are more general than it is. Terms in a query are individually
matched with corresponding concepts in the taxonomy together with their
For example, given the general semantic facts that "washing" is a kind
of "cleaning" and "car" is a kind of "automobile", an algorithmic
classification system can automatically classify "car washing" as a kind
of "automobile cleaning". A query for "automobile cleaning" or
"automobile wahing" will immediately retrieve hits for "car washing"."    (04)

Peter    (05)

