Graph Structures In XML (GSIX) Specification
Draft Version 0.1
This Document: 2001-06-25
Latest Version: 2001-06-25

Author: Peter P. Jones. [email]
© Peter P. Jones, 2001. All rights reserved.

This document has no official standing nor endorsement from any organization.

Aims

To create a global interchange format for Graph Structures - metadata information, knowledge bases, and the like. The initial prototype for development was SIX, but after some consideration it was decided that a firm processing model was required. GSIX is therefore a specialisation of SIX intended to enhance the capacity for graph interchange. The GSIX specification also attempts to render explicit and explicate a great many issues that were not explained in SIX, and to remedy many problems and errors.

Introductory Explanation

The best way to think of the way GSIX works is to imagine a system of nodes, arcs, and bubbles. In RDF a basic triple is a node connected to another node by a labelled directed arc. RDF nodes can be in contexts defined as other nodes (i.e. connected by a 'context' arc to a context node). In Topic Maps n-ary associations are effectively composed of many triples, the nodes of which are within scopes defined as references to topics (in similar fashion to RDF). So one can then see Topic Maps as consisting of triples that are in bubbles (the scopes or contexts).
GSIX works on nodes and arcs in bubbles too, but it is important to see that whilst the nodes in Topic Maps are all defined by system addresses GSIX is designed for symbolic processing of strings and to make use of the unification capabilities of many symbolic processing systems. Nodes and arcs in GSIX are therefore just defined as being their string labels. The node-arc-node (subject-predicate-object) triple construction is called a statement in GSIX.
A singleton (subject) node in GSIX must always have at least one arc (predicate) attached. In GSIX, nodes are always in bubbles (contexts), as are the arcs, even if that context is the raw universal default 'proposition'. However, the bubbles are defined by direct reference to other complete statements or to nodes. The bubbles work as follows:

If the bubble consists purely of a reference to another statement, then the context of the referring statement is the referenced statement in its entirety, and the context of the referring statement is automatically a sub-context of that of the referenced statement.
If the context of a referring statement contains a reference to another statement, and the content of the referring statement contains a node string that is identical to one in the referenced statement, the nodes are unified under the scope of the referenced statement. This is the correct method for unifying nodes within a GSIX graph.
If the bubble consists purely of a reference to another node (not occurring within that statement), then the context of the referring statement is that node in its entirety (however many properties that node might have across various contexts). In effect, the referring statement is subsumed under all the contexts in which the referenced node occurs. Clearly, if your graph is to have global application then you must be very careful about node labels if you intend to use this mechanism.
Statement bubbles cannot reference nodes that are already contained within that statement.

Further explanation of the processing of a GSIX document will be interspersed with the definitions below.

Structures:

(brackets are for clarity)

statement
statement ::= (subject (object*| wcvar*))

A statement has a subject and zero or more objects or wildcard placeholders

object

An object can be any string. An object can be a URI for example. It is up to the processing system whether the URI is understood to be an address of a resource or not. An object string might be the same as a statement identifier. Again, it is up to the processing system whether the string is understood to be a reference to the ID of a statement or not.

subject
subject ::= (predicate(object+| wcvar+))

A subject is a predicate indicated by a string applying to at least one object or wildcard placeholder

predicate

A predicate is a string.

group qualifier

If there are multiple objects grouped within the statement and the group qualifier is not applied to the statement an expansion is implied. For example, given this statement:

((predicate(v1, v2, v3, v4,...vN)) (x1, x2, x3,...xM) )

Where this could be represented as:

(v1, v2, v3, v4,...vN) -- predicate [-- (x1, x2, x3,...xM)]
(subject---------------------------)(objects*-------------)

Square brackets indicate that the objects are optional.

The effect is to imply the expansion:

1) v1 -- predicate [-- x1, x2, x3,...xM]
2) v2 -- predicate [-- x1, x2, x3,...xM]
3) v3 -- predicate [-- x1, x2, x3,...xM]
4) vN -- predicate
5) ...

Which in turn implies the expansion:

1) v1 -- predicate [-- x1]
2) v1 -- predicate [-- x2]
3) v1 -- predicate [-- x3]
4) ...
N1) vN -- predicate [-- x1]
N2) vN -- predicate [-- x2]
N3) ...

The group qualifier indicates that the expansion should not be made.

group((predicate(v1, v2, v3, v4,...vN)) (x1, x2, x3,...xM) )


reciprocal qualifier
statement with reciprocal qualifier ::= reciprocal(subject (object* |wcvar*))

A statement can be qualified by the optional 'reciprocal' qualifier to indicate whether the predicate is bi-directional or not. The effect of this is to indicate that both of the following statements should result from processing:

((predicate(v1,...vN)) (x1,...xM) )
((predicate(x1,...xM)) (v1,...vN) )

The reciprocal qualifier should not produce any effect if inadvertently applied to a unary predicate (in GSIX a statement with a subject but no object term).

The reciprocal qualifier is always 'outside of' any group qualifier in the GSIX semantics. So:

statement with reciprocal qualifier ::=
        reciprocal(group(subject (object* |wcvar*)))


in-namespace qualifier
namespaced statement ::=
    in-namespace(namespaceName, (subject (object*|wcvar*)) )
     || in-namespace(namespaceName, reciprocal(subject (object*|wcvar*)) )

A statement can be qualified by an optional generic 'in-namespace' qualifier to indicate that the statement is only valid within a particular namespace.

namespaceName
namespaceName ::= group(object* |wcvar*)

A namespaceName consists of objects or wildcard placeholder components and is automatically grouped. A namespace is then effectively the union of the references and strings in the namespaceName group. For the rules of reference, see the Introductory Explanation.

Relations between namespaces

Although 'namespace' is treated as a specific type of qualifier here, there is nothing to prevent the assertion of relations betweens namespaces in statements.

For example, containment of one namespace by another can be asserted with a standard statement -- 'contains' is not a keyword here, just a predicate chosen by the statement author:

group(contains(namespaceName) (namespaceName) )

This statement can also be qualified with a namespace.

in-namespace(namespaceName, (contains(namespaceName) (namespaceName) )

The same applies for predicates other than 'contains'. It is possible to assert any kind of relation between namespaces that you like.

Statement Identifiers

Every statement has an ID that is unique within the document.That is, the ID on the left-hand side of this statement MUST be unique within the document. A system that processes many GSIX documents into one in-memory graph must take steps to ensure that the IDs of statement remain unique within the system.

ID <- statement

Any ID string within a statement's objects or wildcard placeholders that matches a statement's ID could be considered to be a reference to an ID on the LHS. It is recommended that a statement should not refer to its own ID.

GSIX Interchange XML syntax (Example):

<?xml version="1.0" ?>
<gsix type="TopicSet" about="whateverThemeYouWant">
 <statement ID="asdf234ff25" namespace="SREF#545 ´Planet Earth´ kumquats" >
  <subject>
   <predicate predicatename="is-in-love-with" reciprocal="yes">
     <object objstring="YOU7686868" ></object>
    <object ...></object>
     ...
     <object ...></object>
   </predicate>
  </subject>
  <object ...></object>
  <object ...></object>
  ...
  <object ...></object>
 </statement>
 <statement>
 ...
 </statement>
 <ruledoc type="RulesML" rulesref="http://..." >
  Some text describing the rule-set or something like that</ruledoc>
 ...
 <ruledoc type="..." rulesref="http://..." ></ruledoc>
</gsix>

The 'about' Attribute

The <gsix> root element has an attribute 'about'. This can be used by the document author to suggest that the statement set is concerned with some theme rather than others. The recipient of a GSIX document is under no obligation to pay any attention to this suggestion, and can treat the data in ways that do not respect this suggestion if they wish to do so.

The 'type' Attribute

The <gsix> root element also has an attribute 'type'. This can be used by the document author to suggest that the document is a particular type of GSIX document as opposed to others. The recipient of a GSIX document is under no obligation to pay any attention to this suggestion, and can treat the data in ways that do not respect this suggestion if they wish to do so.

The <ruledoc> Element

<ruledoc> elements are an entirely optional means of indicating an association between some set of rules specified elsewhere with the statement set in the document. GSIX document recipients can choose to ignore this indication if they wish to do so. How recipients deal with the indication if they don't ignore it is entirely up to them.

Wildcard Placeholders

GSIX can have templates for statements. These differ only from statement instances in that they can contain wildcard placeholders if desired. These can be partial statement templates, containing a mixture of objects and wildcard placeholders, or complete templates containing only wildcard placeholders (<wcvar> elements in the XML syntax).

The 'grouped' Attribute

Statements have a 'grouped' attribute. This indicates the manner in which a statement should be viewed by a processing system that acknowledged the attribute.
grouped="yes" implies that the group qualifier applies to the statement.

GSIX Interchange XML Format DTD (Normative):

<!ELEMENT gsix (statement*, ruledoc*) >
<!ELEMENT statement (subject, (object|wcvar)* >
<!ELEMENT subject (predicate) >
<!ELEMENT predicate (object|wcvar)+ >
<!ELEMENT object (#PCDATA) >
<!ELEMENT wcvar (#PCDATA) >
<!ELEMENT ruledoc (#PCDATA) >
<!ATTLIST gsix
   about CDATA #IMPLIED
   type CDATA #IMPLIED
>
<!ATTLIST statement
   ID CDATA #REQUIRED
   grouped (yes|no) "no"
   namespace CDATA #IMPLIED
>
<!ATTLIST predicate
   predicatename CDATA #REQUIRED
   reciprocal (yes| no) "no"
>
<!ATTLIST object
   objstring CDATA #REQUIRED
>
<!ATTLIST wcvar
   varstring CDATA #REQUIRED
>
<!ATTLIST ruledoc
   type CDATA #REQUIRED
   rulesref CDATA #REQUIRED
>



Licensing:

Anyone can use GSIX in any fashion they like as long as they observe copyright.

DISCLAIMER:

The Author (Peter P. Jones) of this specification accepts no liability whatsoever for anything bad resulting from the use of GSIX.