[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Deep contexts WAS: Re: [ba-ohs-talk] Re: A modest proposal


Hi Murray,    (01)

OK, let me try and explain the idea of deep contexts some more first,
then I hope to be able to present the argument as to why
the division of the example topic you've provided into 4 topics is not
really making any great change for the user, although it
does make a big change behind the scenes for processing and system
significance.    (02)

{Warning: Very long post riding roughshod over sensitive terrain in
places.}    (03)

Deep contexts:
In XTM 1.0 and ISO13250 there was a 'catch' with scoping topics.
Imagine you have TM1 that contains a topic T1, 'Cecil the cat', in scope
S1, 'house 51', and a TM2 that contains a topic, C2, 'Cecil the cat' in
scope, S2, 'house 51'.
TM1 and TM2 are separate documents ostensibly about different subjects.
I load TM1 into my TM processor. I surf it's contents and via one of its
occurrences I am accidentally led to click on a link that imports TM2
into the processor.
T1 and T2 appear to be about the same thing, and they appear to be in
the same scope. So let's try merging them.
Topic maps say that the same baseName in the same scope indicates a
topic that should be merged.
So it looks like T1 and T2 should be merged. But how do we determine
whether S1 and S2 are in fact the same scope.
Well, they appear to have the same baseName in the same scope, so maybe
they should be merged too.
What are the scopes of S1 and S2's baseNames though? Well, they appear
to have the same baseName in the same scope so maybe they
should be merged too? And so on.
Is there an end to the regress?
There are three methods which can be combined.
1-  Restrict the application TM set so that you know that there is an
upper bound in advance.
2-  Don't look too closely at the scoped properties of scoping topics
(or put otherwise, things aren't so bad if every topic has a baseName in
the unconstrained scope or has a Subject that is a PSI - which is the
same thing). We are "strongly encouraged to use common Published Subject
Indicators" in TMPM4.
On the subject of PSIs TMPM4 says, "Such organizations [that serve
communities of interest] should commit themselves to preserving the
longterm validity of the published addresses of such identity points, in
order to protect the value and mergeability of the topic maps that use
them."
3-  Ignore the Name merger rule and only go for merger when a topic
Subject is the same resource address in both cases.
Although cf. section 7 of TMPM4 at www.topicmaps.net
"The Subject-based Merging Rule requires conforming topic map processing
systems to merge t-nodes that are known to such systems to have the same
subject, *on the basis of whatever information is available to them*. In
addition, the Subject-based Merging Rule requires conforming topic map
processing systems to conclude, on the basis of *certain conditions*,
that two t-nodes have the same subject, and that they therefore must be
merged into a single t-node." [my added emphasis]
What are the conditions TMPM4 proposes? To quote at length:    (04)

"Whenever two t-nodes both have identity points that are subject
constituting resources, they must be merged if and only if the two
subject constituting resources are known to the processing system to be
one and the same resource, regardless of how that resource may have been
differently addressed. In other words, merging is required if and only
if the two addresses are known to the processing system to be
equivalent.    (05)

"All t-nodes have at least one subject indicator resource. (If nothing
else, a t-node must at least have the syntactic construct that demanded
its existence as one of its subject indicators.) Two t-nodes that do not
have subject constituting resources shall be merged if and only if:    (06)

either:    (07)

"one of the two t-nodes has at least one subject indicator resource that
is known to the processing system to be the same resource that serves as
one of the subject indicators of the other t-node,
[PPJ: and it seems to me that if the resource is within a topic map the
TM1 vs. TM2 regress continues here.]    (08)

or:    (09)

"the two subject indicator resources indicating the subject are known
(on account of machine intelligence or human intervention) to the
processing system to describe the same subject.
[PPJ: which some might say might be passing the buck a tad too much or
encouraging an approach that has great risks for the integrity of the
information. But perhaps that would be asking too much of XTM 1.0 here.]    (010)

"For purposes of the Subject-based Merging Rule, it is irrelevant
whether two subject indicator resources, or two subject constituting
resources, contain the same data or are the same string. A simple string
comparison of the two subject indicator resources is not, in the general
case, a reliable indication of whether or not the same subject is being
described. For example, different products in different sales catalogs
may coincidentally have the same catalog number, and a comparison of the
two catalog numbers does not indicate that they are the same product.
Therefore, the Subject-based Merging Rule is not based on comparing the
data content of the resources that serve as identity points. Merging
must occur if and only if:    (011)

"either both subject identity points are subject indicators, or both
subject identity points are subject constituters (i.e., they can't be
mixed), and    (012)

"they are one and the same resource, meaning that they exist in exact
same addressable context, even though there may be multiple different
equivalent addressing expressions that can arrive at that same resource
in that same addressable context."
[PPJ: But note that this approach does not cure issues with polysemy
where two topics with the same resource indicator and baseName
nonetheless have different types. And up the scoping regress we go
again.]    (013)

All of these three approaches have, imho, arise from a not altogether
satisfactory limiting of the contextual information in terms of the
depth of investigation you can make into the properties of a scoping
topic.
If in XTM 1.0 you could investigate the properties of such topics,
instead of getting answers as you investigate more deeply into the
scoping hierarchy you would get more questions - a negative regress.
(Hold that thought for a moment.)    (014)

Now, my objections (but see comment below these):
Objection to 1: is that it prevents the TM system being opened out into
the web as an open collaborative enterprise.
Objection to 2: it's buck passing on to another less than satisfactory
system, because, let's face, agreeing on the words to use for something
is a perennial human problem.
Objection to 3: it's buck passing with less accuracy than in 2. You
can't really nail _that much_ from an address on the web.    (015)

If all I were speaking of were ISO13250 then I would admit that I am
being far too harsh. The aims of ISO13250 were to index structured
collections, and merge indexes. Indexing is not knowledge representation
in the strict sense of both terms.
So ISO 13250 arguably does what it is supposed to.
However, it appeared to me that in XTM 1.0 meetings that for some the
agenda went further, and that attempts were made to extend the game to
KR.
And then things became very strained and confusing and messy.    (016)

But now, what I still believe, after much thought, that XTM could be
converted to do full blown KR, and that it might prove to be a solid
lingua franca for such applications, and do so globally, and openly on
the web,
One would need to remove negative regressions and make topic property
comparison stricter.
Then as you investigate up the scoping hierarchy you would get answers,
not questions, and the answers would run as far up the scheme of things
as you needed to look to be sure that you were putting something in the
right place.
You would have what I am calling 'deep contexts'.    (017)

Now, to come back to the 'splitting into four topics' issue. Given the
above, I hope you can now see why there is at least an argument (and
only an argument as yet!) for removing internal scoping from topics and
restricting baseName cardinality, etc.
Then, do you agree that if I split your topic into four, each with it's
own scope, then the only information that is lost, is that they all
relate to the same binding point.
My answer to that is: in indexing maybe the binding point is a useful
shorthand, in KR perhaps it creates too much implicit information that
would be better held in explicit scoped associations.    (018)

I hope that makes my approach somewhat clearer.    (019)

The discussion and debate, as ever, much appreciated.    (020)

--
Peter    (021)


----- Original Message -----
From: "Murray Altheim" <m.altheim@open.ac.uk>
To: <ba-ohs-talk@bootstrap.org>
Sent: Friday, March 22, 2002 5:23 PM
Subject: Re: [ba-ohs-talk] Re: A modest proposal    (022)


> Peter Jones wrote:
>
> > Yes, you would.
> > But then presumably if you were Swahili and wanted to look up
giraffes
> > you would
> > be searching in Swahili and not in English.
> > Swahili would be a scope.
>
>
> Yes, exactly.
>
>
> > Then it makes no difference to what the user would see, it only
enables
> > deep contexts within the system if that is desirable.
>
>
> But this is how topic maps work. Your use of "deep contexts" I
> don't quite understand.
>
> Now, for reasons I'll not elaborate, let's try an example of this
> with "elephant" in Zulu instead of a "giraffe" in Swahili (okay,
> I could locate a speaker of Zulu but not Swahili on short notice,
> and "giraffe" isn't translated into Japanese or Korean AFAIK). The
> topic element looks like this:
>
>    <topic id="ele34">
>      <subjectIdentity>
>        <subjectIndicatorRef
>          xlink:href="http://www.altheim.com/zoo/elephant.html"/>
>      </subjectIdentity>
>      <baseName><!-- Zulu -->
>        <scope><topicRef xlink:href="../language.xtm#zu"/></scope>
>        <baseNameString>ndofu</baseNameString>
>      </baseName>
>      <baseName><!-- Korean -->
>        <scope><topicRef xlink:href="../language.xtm#ko"/></scope>
>        <baseNameString>코끼리</baseNameString>
>      </baseName>
>      <baseName><!-- Japanese -->
>        <scope><topicRef xlink:href="../language.xtm#ja"/></scope>
>        <baseNameString>象</baseNameString>
>      </baseName>
>      <baseName><!-- English -->
>        <scope><topicRef xlink:href="../language.xtm#en"/></scope>
>        <baseNameString>elephant</baseNameString>
>      </baseName>
>      ...
>    </topic>
>
> I've included "elephant" basenames in Zulu, Korean, Japanese
> and English. This is completely typical XTM, as shown in the
> examples in the XTM spec. I'm not sure why you'd want to do
> anything different, such as divide it into four topics. If
> we assume that
>
>    "http://www.altheim.com/zoo/elephant.html";
>
> is an acceptable indicator of the subject "elephant", then all
> three of the divided topics would appropriately contain a
> <subjectIdentity> pointing to that URL, and would be merged
> back into what you see above the first time it was processed
> through a compliant topic map engine. People in four languages
> could locate the topic (whose ID is "ele34") by searching in
> the scope of either of the four languages provided.
>
> Murray
>
> ......................................................................
> Murray Altheim                         <mailto:m.altheim @ open.ac.uk>
> Knowledge Media Institute
> The Open University, Milton Keynes, Bucks, MK7 6AA, UK
>
>       In the evening
>       The rice leaves in the garden
>       Rustle in the autumn wind
>       That blows through my reed hut.  -- Minamoto no Tsunenobu
>
>    (023)