Graph Structures In XML (GSIX) Specification Draft Version 0.1 This Document: 2001-06-25 Latest Version: 2001-06-25 Author: Peter P. Jones. [email] This document has no official standing nor endorsement from any organization. |
Aims
To create a global interchange format for Graph Structures - metadata information, knowledge bases, and the like. The initial prototype for development was SIX, but after some consideration it was decided that a firm processing model was required. GSIX is therefore a specialisation of SIX intended to enhance the capacity for graph interchange. The GSIX specification also attempts to render explicit and explicate a great many issues that were not explained in SIX, and to remedy many problems and errors. Introductory ExplanationThe best way to think of the way GSIX works is to imagine a system of nodes, arcs, and bubbles. In RDF a basic triple is a node connected to another node by a labelled directed arc. RDF nodes can be in contexts defined as other nodes (i.e. connected by a 'context' arc to a context node). In Topic Maps n-ary associations are effectively composed of many triples, the nodes of which are within scopes defined as references to topics (in similar fashion to RDF). So one can then see Topic Maps as consisting of triples that are in bubbles (the scopes or contexts). (brackets are for clarity) statement
statement ::= (subject (object*| wcvar*))
A statement has a subject and zero or more objects or wildcard placeholders objectAn object can be any string. An object can be a URI for example. It is up to the processing system whether the URI is understood to be an address of a resource or not. An object string might be the same as a statement identifier. Again, it is up to the processing system whether the string is understood to be a reference to the ID of a statement or not. subject
subject ::= (predicate(object+| wcvar+))
A subject is a predicate indicated by a string applying to at least one object or wildcard placeholder predicateA predicate is a string. group qualifierIf there are multiple objects grouped within the statement and the group qualifier is not applied to the statement an expansion is implied. For example, given this statement:
((predicate(v1, v2, v3, v4,...vN)) (x1, x2, x3,...xM) )
Where this could be represented as:
(v1, v2, v3, v4,...vN) -- predicate [-- (x1, x2, x3,...xM)]
(subject---------------------------)(objects*-------------) Square brackets indicate that the objects are optional. The effect is to imply the expansion:
1) v1 -- predicate [-- x1, x2, x3,...xM]
2) v2 -- predicate [-- x1, x2, x3,...xM] 3) v3 -- predicate [-- x1, x2, x3,...xM] 4) vN -- predicate 5) ... Which in turn implies the expansion:
1) v1 -- predicate [-- x1]
2) v1 -- predicate [-- x2] 3) v1 -- predicate [-- x3] 4) ... N1) vN -- predicate [-- x1] N2) vN -- predicate [-- x2] N3) ... The group qualifier indicates that the expansion should not be made.
group((predicate(v1, v2, v3, v4,...vN)) (x1, x2, x3,...xM) )
reciprocal qualifier
statement with reciprocal qualifier ::= reciprocal(subject (object* |wcvar*))
A statement can be qualified by the optional 'reciprocal' qualifier to indicate whether the predicate is bi-directional or not. The effect of this is to indicate that both of the following statements should result from processing:
((predicate(v1,...vN)) (x1,...xM) )
((predicate(x1,...xM)) (v1,...vN) ) The reciprocal qualifier should not produce any effect if inadvertently applied to a unary predicate (in GSIX a statement with a subject but no object term). The reciprocal qualifier is always 'outside of' any group qualifier in the GSIX semantics. So:
statement with reciprocal qualifier ::=
reciprocal(group(subject (object* |wcvar*))) in-namespace qualifier
namespaced statement ::=
in-namespace(namespaceName, (subject (object*|wcvar*)) ) || in-namespace(namespaceName, reciprocal(subject (object*|wcvar*)) ) A statement can be qualified by an optional generic 'in-namespace' qualifier to indicate that the statement is only valid within a particular namespace. namespaceName
namespaceName ::= group(object* |wcvar*)
A namespaceName consists of objects or wildcard placeholder components and is automatically grouped. A namespace is then effectively the union of the references and strings in the namespaceName group. For the rules of reference, see the Introductory Explanation. Relations between namespacesAlthough 'namespace' is treated as a specific type of qualifier here, there is nothing to prevent the assertion of relations betweens namespaces in statements. For example, containment of one namespace by another can be asserted with a standard statement -- 'contains' is not a keyword here, just a predicate chosen by the statement author:
group(contains(namespaceName) (namespaceName) )
This statement can also be qualified with a namespace.
in-namespace(namespaceName, (contains(namespaceName) (namespaceName) )
The same applies for predicates other than 'contains'. It is possible to assert any kind of relation between namespaces that you like. Statement IdentifiersEvery statement has an ID that is unique within the document.That is, the ID on the left-hand side of this statement MUST be unique within the document. A system that processes many GSIX documents into one in-memory graph must take steps to ensure that the IDs of statement remain unique within the system.
ID <- statement
Any ID string within a statement's objects or wildcard placeholders that matches a statement's ID could be considered to be a reference to an ID on the LHS. It is recommended that a statement should not refer to its own ID. GSIX Interchange XML syntax (Example):
<?xml version="1.0" ?>
The 'about' Attribute
<gsix type="TopicSet" about="whateverThemeYouWant"> <statement ID="asdf234ff25" namespace="SREF#545 ´Planet Earth´ kumquats" > <subject> <predicate predicatename="is-in-love-with" reciprocal="yes"> <object objstring="YOU7686868" ></object> <object ...></object> ... <object ...></object> </predicate> </subject> <object ...></object> <object ...></object> ... <object ...></object> </statement> <statement> ... </statement> <ruledoc type="RulesML" rulesref="http://..." > Some text describing the rule-set or something like that</ruledoc> ... <ruledoc type="..." rulesref="http://..." ></ruledoc> </gsix> The <gsix> root element has an attribute 'about'. This can be used by the document author to suggest that the statement set is concerned with some theme rather than others. The recipient of a GSIX document is under no obligation to pay any attention to this suggestion, and can treat the data in ways that do not respect this suggestion if they wish to do so. The 'type' AttributeThe <gsix> root element also has an attribute 'type'. This can be used by the document author to suggest that the document is a particular type of GSIX document as opposed to others. The recipient of a GSIX document is under no obligation to pay any attention to this suggestion, and can treat the data in ways that do not respect this suggestion if they wish to do so. The <ruledoc> Element<ruledoc> elements are an entirely optional means of indicating an association between some set of rules specified elsewhere with the statement set in the document. GSIX document recipients can choose to ignore this indication if they wish to do so. How recipients deal with the indication if they don't ignore it is entirely up to them. Wildcard PlaceholdersGSIX can have templates for statements. These differ only from statement instances in that they can contain wildcard placeholders if desired. These can be partial statement templates, containing a mixture of objects and wildcard placeholders, or complete templates containing only wildcard placeholders (<wcvar> elements in the XML syntax). The 'grouped' Attribute
Statements have a 'grouped' attribute. This indicates the manner in which a statement should be viewed by a processing system that acknowledged the attribute.
<!ELEMENT gsix (statement*, ruledoc*) >
<!ELEMENT statement (subject, (object|wcvar)* > <!ELEMENT subject (predicate) > <!ELEMENT predicate (object|wcvar)+ > <!ELEMENT object (#PCDATA) > <!ELEMENT wcvar (#PCDATA) > <!ELEMENT ruledoc (#PCDATA) > <!ATTLIST gsix about CDATA #IMPLIED type CDATA #IMPLIED > <!ATTLIST statement ID CDATA #REQUIRED grouped (yes|no) "no" namespace CDATA #IMPLIED > <!ATTLIST predicate predicatename CDATA #REQUIRED reciprocal (yes| no) "no" > <!ATTLIST object objstring CDATA #REQUIRED > <!ATTLIST wcvar varstring CDATA #REQUIRED > <!ATTLIST ruledoc type CDATA #REQUIRED rulesref CDATA #REQUIRED > Licensing: Anyone can use GSIX in any fashion they like as long as they observe copyright. DISCLAIMER:The Author (Peter P. Jones) of this specification accepts no liability whatsoever for anything bad resulting from the use of GSIX. |