[ba-ohs-talk] Fenfire, RDF (re "Towards a Standard Graph-Based...")
Dear Eugene, (01)
Toni Alatalo pointed me to your paper, `Towards a Standard Graph-Based
Data Model for the Open Hyperdocument System`__. You make the point that
a standard graph-based model could allow different applications to be
integrated into a single whole. You say that RDF is one possible choice
for a modeling language, but reject it because of its complicated syntax. (02)
__ http://www.eekim.com/ohs/papers/graphmodel/ (03)
The Fenfire project (formerly `Gzz`_, formerly GZigZag), which I'm a
developer on, has recently `adopted`_ RDF as its model (i.e., all data
is stored in RDF). (04)
.. _Gzz: http://savannah.nongnu.org/projects/gzz/
.. _adopted:
http://mail.nongnu.org/archive/html/gzz-dev/2003-02/msg00058.html (05)
Our aim is very close to what you describe; we want information from any
application on a computer system (or network) to be available for
linking with information from any other application, in any linking
structure (for example IBIS discussion). For a person I'm in contact
with, there should be a single node on my computer, connected to their
address, their birthday, my appointments with them, emails I received
from them, photos of them, and so on. (06)
(We originally intended to implement this vision based on Ted Nelson's
zzstructure, but we had to stop using that due to `patent problems`_.
Toni has `noticed`_ that zzstructure was avoided by (some in) the OHS
community `because of the patent`_.) (07)
.. _patent problems:
http://mail.nongnu.org/archive/html/gzz-dev/2003-02/msg00042.html
.. _noticed:
http://mail.nongnu.org/archive/html/gzz-dev/2003-02/msg00130.html
.. _because of the patent:
http://www.bootstrap.org/lists/ba-ohs-talk/0204/msg00187.html (08)
So, I'm hoping for some discussion about whether RDF is the right choice
for these goals. Your concern about RDF is the RDF/XML serialization
syntax. However, the RDF model is specified independently (`Concepts and
Abstract Syntax`_) from its serialization (`RDF/XML Syntax
Specification`_). (09)
.. _Concepts and Abstract Syntax: http://www.w3.org/TR/rdf-concepts/
.. _RDF/XML Syntax Specification:
http://www.w3.org/TR/rdf-syntax-grammar/ (010)
It is entirely possible to specify an alternate syntax for RDF, though
RDF/XML is of course prefered as the standard. `N3`_ (Notation 3) is
another syntax of RDF in actual use, designed to be practical to read
and write for humans. There is ongoing work about XML `Schema
annotations`_ allowing any schema-conforming XML to be interpreted as
RDF. For Fenfire, we need a canonical format for RDF graphs, so that
equal RDF graphs are always serialized to the same byte sequence; we
might invent our own serialization language for that. (011)
.. _N3: http://www.w3.org/2000/10/swap/Primer
.. _Schema annotations: http://www.w3.org/2003/02/schema-annotation.html (012)
You `propose`_ `GSIX`_ and `GXL`_ as possible alternatives to RDF. I
must admit that I have not surveyed the two as thoroughly as RDF, but
from my cursory looks, I've felt that RDF is better suited to our goals
(Fenfire's, and maybe also yours) than either of them. (013)
.. _propose: http://www.eekim.com/ohs/papers/graphmodel/#nid074
.. _GSIX: http://www.concept67.fsnet.co.uk/gsix/
.. _GXL: http://www.gupro.de/GXL/ (014)
Firstly, RDF's stated goal is that "Anybody can say anything about
anything." To archieve this goal, it uses URIs to identify both nodes
and edge types. This provides-- (015)
Orthogonality
Two applications can say things about the same node without ever
knowing about each other. Since each has its own set of properties
(graph edge types), neither 'sees' or inteferes with the other's. (016)
This allows different structures to overlap, as you
`too describe`__. (017)
__ http://www.eekim.com/ohs/papers/graphmodel/#nid066 (018)
Location-independent links
A node identified by a URI can appear in different graphs-- for
example, my personal information manager, your design proposal,
and an email to this list. When I view the mail, I see my
personal connections to the nodes appearing in it; when you
view it, you see yours. Anybody can publish links to it without
modifying the original context (e.g. the mail). (019)
This seems pretty essential for a hypertext/hypermedia system. (020)
From my cursory look, it seems that GXL uses URIs for edges (thus
providing orthogonality) but not for nodes (thus not providing
location-independent links). GSIX seems to use local identifiers for
both edges and nodes. (021)
Secondly, RDF has a really simple model. Here's my summary (I'm
structuring into 'steps' for easier understandability-- step 3 is the
real model): (022)
Step 1. A directed labelled graph where each node is a URI
(actually, URIref-- it can contain a fragment identifier), and each
arc is labelled by a URI. In other words, a set of triples of URIs.
Triples are interpreted as [subject, predicate, object]. (023)
Step 2. In addition to step 1, the objects of triples can be
*literals* instead of URIs. A literal is a Unicode string, with an
optional datatype (URIref) and an optional language attribute ('de',
'fi' etc). (024)
Step 3. In addition to step 2, the subjects and objects of triples
can be *blank nodes*. A blank node is like a URIref, but local to a
graph (if you join two graphs, the blank nodes in each one are
different). (025)
GSIX seems to be reasonably simple, too-- it has 'wildcard placeholders'
and 'cspaces' which I haven't taken the time to understand, but I
suppose it would still be ok. GXL, on the other hand, tries to express
*any* graph-- directed, labelled, attributed, hierarchical, whatnot. (026)
For Fenfire, we want a very simple model because we want moderately
skilled users to learn it and use it to their own benefits. We don't
want users to be locked into expressing those kinds of associations some
programmer or designer came up with; rather, when they find they have a
need for expressing their own kind of structure/relationships, they
should be able to do so. And all structures should be viewable in a
single 'structure editor' (even though we'll have all sorts of different
views for application-specific data which you can switch forth and back
between-- we share the OHS's vision here). (027)
I also believe that having e.g. attributes on nodes in addition to
connections on nodes (as possible in GXL) is exactly wrong, because it
doesn't allow different applications to add other attributes,
orthogonally. In RDF, all attributes would be represented as properties
(edges in the graph). This way of expressing the same information is
orthogonal as defined above. (028)
The simplicity is also a reason for me to favor RDF over Nodal: in
Nodal, you have predefined types like sequence or map; in RDF, the same
information is expressed, but in terms of a very simple underlying
structure (which also provides for orthogonality and
location-independent links). (029)
Finally, RDF has a user community-- not as big as XML, but there are
e.g. extensive libraries for Java, Python or C, and there's a lot of
research going on based on it. There are parsers and writers, schema
languages and tools for checking conformance to a schema, there are
defined vocabularies (i.e., sets of relationships and nodes with an
assigned meaning), etc. I even hope that we'll profit from the work
being done w.r.t. semantics: If tools are developed for translating one
web service's RDF vocabulary to another's, these tools should be usable
to translate one hypertext application's vocabulary into another's; for
example, we could use a view developed for one vocabulary to view data
stored in another. (030)
If the work on XML Schema annotations takes off, we can access XML data
from our system without building additional conversion tools. (031)
So these are reasons for us selecting RDF-- *orthogonality*,
*location-independent links* and *simple structure* being probably the
most important. I'm hoping for and am looking forward to discussion. (032)
- Benja (033)