Fwd: [topicmapmail] TMs and Versioning

From: Murray Altheim (murray.altheim@sun.com)
Date: Wed Jul 11 2001 - 11:32:05 PDT

To those of you who know Eliot Kimber, this message is no surprise.
Eliot is one of the most brilliant people I've met in the computing
field, and a co-author of the Hytime specification. He's also one of
the nicest and most generous in being willing to take the time to
explain complex ideas to relative novices (like me) over the years.

I'm forwarding this into the list because it has strong overlap with
some of the OHS requirements regarding versioning. At this point the
project he describes is proprietary to DataChannel (who owns Isogen,
the company he works for), but as he states in his message, the specs
are public and he's hoping to open source it at some point.

-------- Original Message --------
Subject: Re: [topicmapmail] TMs and Versioning
Date: Wed, 11 Jul 2001 11:16:00 -0500
From: "W. Eliot Kimber" <eliot@isogen.com>
Reply-To: eliot@isogen.com
Organization: DataChannel, Inc
To: Topic Map Mail <topicmapmail@infoloom.com>
References: <MBBBJFPGEFFJABFNFAKMMEDNCAAA.bcngroup@erols.com>

Paul Stephen Prueitt wrote:
> Patrick's dialog with Jaworski is informative in several regards.
> I wonder if the notion of version might have with it the notion of
> aggregation (into a category structure) from the past to the present. Some
> type of reinforcement learning might make modifications to the "current"
> topic map, in order to reflect lessons learned. So, the current topic map
> might reflect a situated ness that has the context of an evolution as well
> as a relevance to the present.

In the versioning link management system we're building in Austin, we
have the concept of "resource" (not particularly original to version
control), which is the set of all versions in time of a "thing" (where
"thingness" is asserted by the person creating the resource and its
versions--we impose no particular rules for mapping resources to
real-world objects). Each version of a resource knows about all its
previous and all its next versions (if any). This means that versions
can be related in a directed graph. Versioning as this level is of
"storage objects" (and, more completely in our implementation, entities
in the XML sense).

A version with multiple previous versions represents some sort of merge
of the previous versions (or simply a selection among multiple possible
precursors to use as the latest version). A version with multiple next
versions represents a branching from one "development track" to two or

Thus, the system maintains all the information needed to remember the
evolutionary history of the components of a topic map. In addition, we
provide "pointers" that are arbitrary relations between versions and
other resources--these can be used to capture any sort of dependency
reflecting any business logic. For example, in topic map authoring
application you might have specialized dependencies that reflect things
like "source used as base for forming new argument" or "merge source" or
whatever. In our base system we use pointers to remember the
document-to-document dependencies inherent in XML and in HyTime (e.g.,
entities declared by, bounded object set membership, etc.).

By using a simple form of indirection ("referent tracking documents"
(RTDs) as defined by myself and Peter and Steve Newcomb in our paper to
XML 1999), this storage object versioning mechanism can be extended to
the versioning of hyperlink references by creating for a *target* thing
a proxy document that represents the thing pointed at. These proxy
documents are themselves versioned such that for a given semantic
"referent" (a thing pointed at for a particular rhetorical or practical
purpose), you can know what specific thing that referent was represented
by at any point in time. Because the RTDs can point to anything
addressible, they can be used to effectively track versions of
individual elements within larger documents, for example (or in fact,
versions of any nodes in any groves of any type the underlying system is
capable of managing). This allows for unbounded precision in your
versioning memory at essentially constant cost regardless of the
granularity chosen (that is, it costs the same to point to an entire
document as it does to one character).

If a referer (e.g., an occurrence pointed to by a topic or association
role member pointed to by an association) points to the RTD *resource*,
the resolution of the pointer to a version or versions of the RTD is
dependent on an arbitrary selection algorithm applied at the time the
pointer is resolved, e.g., "latest available", "current as of time x",
"with property y value w", etc. If the a referrer points to a specific
version (or versions) the reference is hardened. Once a version or
versions have been retrieved, the pointers they contain are resolved to
the ultimate target of the RTD, the referent itself.

This mechanism makes it possible to maintain complete knowledge of the
version-to-version relationships of things over time while enabling
version-independent reference to those things. It enables the creation
of hyperlinks that are insensitive to the versioning details of the
things related. It allows for dynamic construction of result hyperdocs
(and thus result topic maps) by varying the algorithms used to resolve
references to RTDs. For example, you can easily ask the question "what
was the effective topic map at time "x").

[By further organizing versions into different branches, you can gain
further control by limiting the resolution of pointers to versions on a
particular branch. For example, a resource may have two versions current
at the same time, one in each of two branches. By making your selection
criteria "versions current at time X in branch Y", you can further
control the history. In an authoring system, different branches might
represent different stages of completeness (e.g., a development branch,
a review branch, and a published branch). (From a management standpoint,
branches also allow for control of access to versions by limiting a
user's access to a particular branch.)]

I think this mechanism satisfies all the requirements inherent in Paul's
statement above. Enabling and doing what Paul describes is one of our
chief motivators for building this system. In particular, we see our
system as a base for enabling enterpise-scalable topic map systems of a
very general nature.

We have not yet had time to integrate any topic map implementations with
our system, but it is certainly in our plan to do so. In one sense, a
topic map system could be a relatively simply extension module built on
top of our basic linking and version management facilities. We have
certainly architected the system to expressly enable that sort of

[Unfortunately, this system is current proprietary DataChannel
technology that we are not at liberty to sell or give away except in the
context of professional services engagements. However, we are trying to
find a way to open source some or all of it sooner rather than later.
But, the basic architecture is publicly defined (in large part in our
RTD paper) so there is nothing preventing others from implementing this



. . . . . . . . . . . . . . . . . . . . . . . .

W. Eliot Kimber | Lead Brain

1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com

w w w . d a t a c h a n n e l . c o m _______________________________________________ topicmapmail mailing list topicmapmail@infoloom.com http://www.infoloom.com/mailman/listinfo/topicmapmail

........................................................................... Murray Altheim <mailto:murray.altheim&#x40;sun.com> XML Technology Center Sun Microsystems, Inc., MS MPK17-102, 1601 Willow Rd., Menlo Park, CA 94025

In the evening The rice leaves in the garden Rustle in the autumn wind That blows through my reed hut. -- Minamoto no Tsunenobu

This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:58:07 PDT