Dear Stardog developers and users,
we are currently evaluating and benchmarking SPARQL-Endpoints w.r.t.
RDF reification and how they deal with a lot of metadata (e.g.
provenance, confidence, geospatial, time etc.) on per statement level.
As approaches using the graph identifier or introducing singleton
properties to simulate a statement identifier do not perform very well
at large scale or have other disadvantages, we focus on comparing these
approaches with direct ways of statement identifiers or reification. We
are currently writing and planning to submit a paper to WWW17 Conference
and therefore I would like to ask you some questions:
**bulk import option for reified statements**:
According to the documentation one can use the stardog:identifier() (aka
"reification") function to bind the statement identifier to a sparql
variable. I assume that using that technique I can write a SPARUL Query
to insert a triple t_2 which has the statement id of another triple t_1
as subject and can therefore "attach" metadata to t_1 (direct
reification). Is this correct?
But I think using SPARUL Queries won't scale for importing 2.5 billion triples within a reasonable time.
So is there any (file based) way to bulk load such reified statements?
It would be OK to assume that all the metadata belonging/referencing to
one statement needs to be grouped within a 'nested' statement, (although
you cannot reify a reified statement using that assumption, but that's
fine for my use case)
Maybe is there a way to do that using Tinkerpop/Gremlin using
Graphson Format?? But then I'm not sure what's going to happen with the
URI's. Because in property graphs the 'properties' and their 'values'
are more or less strings and I have RDF properties (so URI's) as
'properties' and RDF Objects (Literals --> thats easy, and URI's
--> could be a problem ) as 'values'.
I appreciate any information or help you can provide.
Best Regards
Johannes Frey