RDFLib with RDF 1.1: Datasets

48 views
Skip to first unread message

Ivan Herman

unread,
Nov 5, 2012, 11:34:58 AM11/5/12
to Gunnar Aastrand Grimnes, rdfli...@googlegroups.com, Niklas Lindström, rdf...@googlecode.com, Dan Brickley
After loooong discussions the RDF WG has agreed in a very minimal format of Datasets. Unfortunately, the latest RDF Draft does not have it yet, hopefully a new version of the draft will be available soon.

A Dataset consists of a default graph and a number of graph with names (a.k.a. 'named graphs'). Datasets may contain empty graphs; blank nodes cannot be used as graph names, only URI-s. Scope of blank nodes are the whole dataset.

All this is very close to the ConjunctiveGraph class, but there are some differences. In Conjunctive Graphs the default graph is named with a blank node, and blank nodes are accepted as 'contexts'. Also, when listing all the contexts, only the non-empty one are returned, etc.

I have implemented a Dataset class as a subclass of the ConjunctiveGraph. It takes care of the issues above and, also, I believe it is closer in its style to the Dataset concept, with an emphasis on constituent graphs rather than quads. Here is the documentation I have added to the class definition on its usage, which shows the choices I have made.

# Create a new Dataset
>>> ds = Dataset()
# simple triples goes to default graph
>>> ds.add( (URIRef('http://example.org/a'),URIRef('http://www.example.org/b'),Literal('foo')) )

# Create a graph in the dataset
# if the graph name has already been used before, the corresponding graph will be returned
# (ie, the Dataset keeps track of the constituent graphs), otherwise a new graph is created in the dataset.
# The special argument Dataset.DEFAULT can be used to return the default graph
>>> g = ds.graph(URIRef('http://www.example.com/gr'))

# add triples to the new graph as usual
>>> g.add( (URIRef('http://example.org/x'),URIRef('http://example.org/y'),Literal('bar')) )
# alternatively: add a quad to the dataset -> goes to the graph
# in the example below, 'http://www.example.com/gr' could be used in the quad as well.
>>> ds.add_quad( (URIRef('http://example.org/x'),URIRef('http://example.org/z'),Literal('foo-bar'),g) )
# There is also a ds.remove_quad methods

# querying triples return them all regardless of their graph
>>> for t in ds.triples((None,None,None)) : print t
(rdflib.term.URIRef(u'http://example.org/a'), rdflib.term.URIRef(u'http://www.example.org/b'), rdflib.term.Literal(u'foo'))
(rdflib.term.URIRef(u'http://example.org/x'), rdflib.term.URIRef(u'http://example.org/z'), rdflib.term.Literal(u'foo-bar'))
(rdflib.term.URIRef(u'http://example.org/x'), rdflib.term.URIRef(u'http://example.org/y'), rdflib.term.Literal(u'bar'))

# querying quads return, well, quads; the fourth argument can be unrestricted or restricted to a graph
>>> for q in ds.quads((None,None,None,None)) : print q
(rdflib.term.URIRef(u'http://example.org/a'), rdflib.term.URIRef(u'http://www.example.org/b'), rdflib.term.Literal(u'foo'), None)
(rdflib.term.URIRef(u'http://example.org/x'), rdflib.term.URIRef(u'http://example.org/y'), rdflib.term.Literal(u'bar'), rdflib.term.URIRef(u'http://www.example.com/gr'))
(rdflib.term.URIRef(u'http://example.org/x'), rdflib.term.URIRef(u'http://example.org/z'), rdflib.term.Literal(u'foo-bar'), rdflib.term.URIRef(u'http://www.example.com/gr'))

>>> for q in ds.quads((None,None,None,g)) : print q
(rdflib.term.URIRef(u'http://example.org/x'), rdflib.term.URIRef(u'http://example.org/y'), rdflib.term.Literal(u'bar'), rdflib.term.URIRef(u'http://www.example.com/gr'))
(rdflib.term.URIRef(u'http://example.org/x'), rdflib.term.URIRef(u'http://example.org/z'), rdflib.term.Literal(u'foo-bar'), rdflib.term.URIRef(u'http://www.example.com/gr'))
# Note that in the call above ds.quads((None,None,None,'http://www.example.com/gr')) would have been accepted, too

# graph names in the dataset can be queried:
>>> for c in ds.graphs() : print c
DEFAULT
http://www.example.com/gr
# A graph can be created without specifying a name; a skolemized genid is created on the fly
>>> h = ds.graph()
>>> for c in ds.graphs() : print c
DEFAULT
http://rdlib.net/.well-known/genid/rdflib/N62d77cceefde41458cccaf34bae5a5f0
http://www.example.com/gr

# Note that the Dataset.graphs() call returns names of empty graphs, too. This can be restricted:
>>> for c in ds.graphs(empty=False) : print c
DEFAULT
http://www.example.com/gr

# A graph can also be removed from a dataset, via
>>>> ds.remove_graph(g)

Changes on the trig serializier
-------------------------------

The trig serializer had to be modified, too: the '=' sign is not used in the official version; the only case when the graph name is a BNode is when serializing the default graph, and that name is filtered out of the output

Unfortunately, I am not really familiar with the parser structures in RDFLib, so I am not sure how to create a trig parser. But that would be necessary, obviously generating a Dataset instance.

Ivan


----
Ivan Herman
4, rue Beauvallon, clos St Joseph
13090 Aix-en-Provence
France
http://www.ivan-herman.net

William Waites

unread,
Nov 5, 2012, 4:39:06 PM11/5/12
to rdfli...@googlegroups.com, Ivan Herman, Gunnar Aastrand Grimnes, Niklas Lindström, rdf...@googlecode.com, Dan Brickley
On 05/11/12 16:34, Ivan Herman wrote:

> All this is very close to the ConjunctiveGraph class, but there are
> some differences. In Conjunctive Graphs the default graph is named
> with a blank node, and blank nodes are accepted as 'contexts'. Also,
> when listing all the contexts, only the non-empty one are returned, etc.
>
> I have implemented a Dataset class as a subclass of the ConjunctiveGraph.
> It takes care of the issues above and, also, I believe it is closer in its
> style to the Dataset concept, with an emphasis on constituent graphs rather
> than quads.

I think that rdflib should permit graphs to be "named" with a
blank node as it currently does. Perhaps it could emit a warning
or have a "strict mode" flag to raise an exception (ought not to
be the default behaviour).

rdflib goes some way towards implementing RDF 2 (heh) with things
like N3 that require "handles" for graphs with no explicit names.
I don't think we should introduce unnecessary special cases and
restrictions like this into rdflib.

I do like the "get a graph from a bag of graphs" API on the Dataset
class but perhaps this could just be introduced as a method on
ConjunctiveGraph, or ConjunctiveGraph could be renamed as Dataset
with this one improvement?

The TriG serialiser could be modified to complain if there exists
more than one graph in the bag with a bnode as a name -- since
that is plainly not serialisable with TriG as things stand, and
as an interchange format that's important.

Just my �0.02,
-w

Ivan Herman

unread,
Nov 5, 2012, 5:11:57 PM11/5/12
to William Waites, rdfli...@googlegroups.com, Gunnar Aastrand Grimnes, Niklas Lindström, rdf...@googlecode.com, Dan Brickley, Gregg Kellogg
Hi William!



On Nov 5, 2012, at 16:39 , William Waites wrote:

> On 05/11/12 16:34, Ivan Herman wrote:
>
>> All this is very close to the ConjunctiveGraph class, but there are
>> some differences. In Conjunctive Graphs the default graph is named
>> with a blank node, and blank nodes are accepted as 'contexts'. Also,
>> when listing all the contexts, only the non-empty one are returned, etc.
>>
>> I have implemented a Dataset class as a subclass of the ConjunctiveGraph.
>> It takes care of the issues above and, also, I believe it is closer in its
>> style to the Dataset concept, with an emphasis on constituent graphs rather
>> than quads.
>
> I think that rdflib should permit graphs to be "named" with a
> blank node as it currently does. Perhaps it could emit a warning
> or have a "strict mode" flag to raise an exception (ought not to
> be the default behaviour).

The current ConjunctiveGraph remains valid... where this is allowed. Personally, I would prefer for the Dataset to behave as it is defined in RDF 1.1. After all, I work at W3C:-)

What is the advantage of allowing blank nodes as names? (When people clearly try to keep away from blank nodes...). The current Dataset has the possibility to create a graph with a unique name in the form of a skolem URI. What use cases would the blank nodes provide that the skolem would not?


>
> rdflib goes some way towards implementing RDF 2 (heh) with things
> like N3 that require "handles" for graphs with no explicit names.
> I don't think we should introduce unnecessary special cases and
> restrictions like this into rdflib.

That is why there are all kinds of graph classes. I am not against having other classes doing more.

>
> I do like the "get a graph from a bag of graphs" API on the Dataset
> class but perhaps this could just be introduced as a method on
> ConjunctiveGraph, or ConjunctiveGraph could be renamed as Dataset
> with this one improvement?
>

Well, there are some others, like the add_quad, remove_graph, remove_quad, which are all 'natural', at least in my view, in a Dataset world. Also, the fact that a Dataset allows for empty graphs means that the current ConjunctiveGraph's contexts call is not o.k. either.

And there is also a terminology. The terminology in the RDF 1.1 world will be, probably, dataset, graph, quad. The term 'context' is probably not used. It is important, in my view, to use the same terms here, which is not the case for the ConjunctiveGraph.

As for renaming: let us not create problems for our users. There is nothing wrong keeping ConjunctiveGraphs as they are, renaming it would break existing applications. Hence my approach of making a subclass of ConjunctiveGraphs as Datasets.

> The TriG serialiser could be modified to complain if there exists
> more than one graph in the bag with a bnode as a name -- since
> that is plainly not serialisable with TriG as things stand, and
> as an interchange format that's important.
>

The problem is that if a ConjunctiveGraph is serialized as of today, there is no way (that I saw) to find out which of the bnode-named graphs are 'real' graphs and which one is the default graph. With the restriction above if a Dataset is serialized via trig, the problem does not occur...

Ivan

> Just my £0.02,
> -w

William Waites

unread,
Nov 6, 2012, 10:24:26 AM11/6/12
to rdfli...@googlegroups.com, Ivan Herman, Gunnar Aastrand Grimnes, Niklas Lindström, rdf...@googlecode.com, Dan Brickley, Gregg Kellogg
Hi Ivan,

On 05/11/12 22:11, Ivan Herman wrote:

> The current ConjunctiveGraph remains valid... where this is allowed.
> Personally, I would prefer for the Dataset to behave as it is defined
> in RDF 1.1. After all, I work at W3C:-)

I can see your point. I'm motivated from the observation that rdflib
implements what is effectively a permissive superset of "standard"
RDF, and generally doesn't enforce some of the strange special cases
that have crept into the standards [1]. As such it is more useful and
expressive than it would be if it followed the spec rigorously. In
general I would not like to see extra restrictions introduced, so long
as rdflib remains compatible with [2] W3C-spec RDF.

Just my opinion. I would like to hear the other devs thoughts on the
matter.

Cheers,
-w


[1] A good example is the recent discussion, put most clearly by Pat
Hayes, about the very strange circumstance of inference rules that
must make use of invalid intermediate statements in order to operate.
FuXi, for example, needs to put these somewhere. It would not be good
if the stores and graph implementations started slavishly following
the spec. I know that is not what you are proposing with these changes,
but I am trying to get at the attitude that rdflib is to take to
RDF 1.1.

[2] "compatible with" meaning, "capable of taking kosher RDF as input
and producing kosher RDF as output" mostly for purposes of interchange
with other systems which may or may not follow the specs closely. I
imagine that everybody agrees that htis is important.

Reply all
Reply to author
Forward
0 new messages