After loooong discussions the RDF WG has agreed in a very minimal format of Datasets. Unfortunately, the latest RDF Draft does not have it yet, hopefully a new version of the draft will be available soon.
A Dataset consists of a default graph and a number of graph with names (a.k.a. 'named graphs'). Datasets may contain empty graphs; blank nodes cannot be used as graph names, only URI-s. Scope of blank nodes are the whole dataset.
All this is very close to the ConjunctiveGraph class, but there are some differences. In Conjunctive Graphs the default graph is named with a blank node, and blank nodes are accepted as 'contexts'. Also, when listing all the contexts, only the non-empty one are returned, etc.
I have implemented a Dataset class as a subclass of the ConjunctiveGraph. It takes care of the issues above and, also, I believe it is closer in its style to the Dataset concept, with an emphasis on constituent graphs rather than quads. Here is the documentation I have added to the class definition on its usage, which shows the choices I have made.
# Create a new Dataset
>>> ds = Dataset()
# simple triples goes to default graph
>>> ds.add( (URIRef('
http://example.org/a'),URIRef('
http://www.example.org/b'),Literal('foo')) )
# Create a graph in the dataset
# if the graph name has already been used before, the corresponding graph will be returned
# (ie, the Dataset keeps track of the constituent graphs), otherwise a new graph is created in the dataset.
# The special argument Dataset.DEFAULT can be used to return the default graph
>>> g = ds.graph(URIRef('
http://www.example.com/gr'))
# add triples to the new graph as usual
>>> g.add( (URIRef('
http://example.org/x'),URIRef('
http://example.org/y'),Literal('bar')) )
# alternatively: add a quad to the dataset -> goes to the graph
# in the example below, '
http://www.example.com/gr' could be used in the quad as well.
>>> ds.add_quad( (URIRef('
http://example.org/x'),URIRef('
http://example.org/z'),Literal('foo-bar'),g) )
# There is also a ds.remove_quad methods
# querying triples return them all regardless of their graph
>>> for t in ds.triples((None,None,None)) : print t
(rdflib.term.URIRef(u'
http://example.org/a'), rdflib.term.URIRef(u'
http://www.example.org/b'), rdflib.term.Literal(u'foo'))
(rdflib.term.URIRef(u'
http://example.org/x'), rdflib.term.URIRef(u'
http://example.org/z'), rdflib.term.Literal(u'foo-bar'))
(rdflib.term.URIRef(u'
http://example.org/x'), rdflib.term.URIRef(u'
http://example.org/y'), rdflib.term.Literal(u'bar'))
# querying quads return, well, quads; the fourth argument can be unrestricted or restricted to a graph
>>> for q in ds.quads((None,None,None,None)) : print q
(rdflib.term.URIRef(u'
http://example.org/a'), rdflib.term.URIRef(u'
http://www.example.org/b'), rdflib.term.Literal(u'foo'), None)
(rdflib.term.URIRef(u'
http://example.org/x'), rdflib.term.URIRef(u'
http://example.org/y'), rdflib.term.Literal(u'bar'), rdflib.term.URIRef(u'
http://www.example.com/gr'))
(rdflib.term.URIRef(u'
http://example.org/x'), rdflib.term.URIRef(u'
http://example.org/z'), rdflib.term.Literal(u'foo-bar'), rdflib.term.URIRef(u'
http://www.example.com/gr'))
>>> for q in ds.quads((None,None,None,g)) : print q
(rdflib.term.URIRef(u'
http://example.org/x'), rdflib.term.URIRef(u'
http://example.org/y'), rdflib.term.Literal(u'bar'), rdflib.term.URIRef(u'
http://www.example.com/gr'))
(rdflib.term.URIRef(u'
http://example.org/x'), rdflib.term.URIRef(u'
http://example.org/z'), rdflib.term.Literal(u'foo-bar'), rdflib.term.URIRef(u'
http://www.example.com/gr'))
# Note that in the call above ds.quads((None,None,None,'
http://www.example.com/gr')) would have been accepted, too
# graph names in the dataset can be queried:
>>> for c in ds.graphs() : print c
DEFAULT
http://www.example.com/gr
# A graph can be created without specifying a name; a skolemized genid is created on the fly
>>> h = ds.graph()
>>> for c in ds.graphs() : print c
DEFAULT
http://rdlib.net/.well-known/genid/rdflib/N62d77cceefde41458cccaf34bae5a5f0
http://www.example.com/gr
# Note that the Dataset.graphs() call returns names of empty graphs, too. This can be restricted:
>>> for c in ds.graphs(empty=False) : print c
DEFAULT
http://www.example.com/gr
# A graph can also be removed from a dataset, via
>>>> ds.remove_graph(g)
Changes on the trig serializier
-------------------------------
The trig serializer had to be modified, too: the '=' sign is not used in the official version; the only case when the graph name is a BNode is when serializing the default graph, and that name is filtered out of the output
Unfortunately, I am not really familiar with the parser structures in RDFLib, so I am not sure how to create a trig parser. But that would be necessary, obviously generating a Dataset instance.
Ivan
----
Ivan Herman
4, rue Beauvallon, clos St Joseph
13090 Aix-en-Provence
France
http://www.ivan-herman.net