Error parsing bnode IDs

ajtucker

unread,

May 9, 2016, 1:01:25 PM5/9/16

to Stardog

Hi Folks,

We've got some workflow that involves turning downloaded RDF documents into one big nquads file for later loading. Essentially, each RDF document we download has its own document URI and we use that as the graph URI.

One issue is that bnodes need to be made unique to each graph. If for example there are two RDF documents, each with a triple _:a a foaf:Person, then we need to ensure the bnode IDs are different in the resulting nquads dump.

We've been doing this by generating a UUID per document and just adding it on to the bnode ID, e.g. if we have _:genid-1, we end up with something like _:genid-FB3693A0-DA74-423A-A8EB-21F603F1625F-1

However, it appears that the RIO parser used by Stardog barfs on bnode IDs with anything other than alpha-numeric characters, which we think is wrong, see https://www.w3.org/TR/n-triples/#BNodes

Looking at https://bitbucket.org/openrdf/sesame/src/69dcca77110d55618b10a62bcf85c2d6d8cff6d2/core/rio/ntriples/src/main/java/org/openrdf/rio/ntriples/NTriplesParser.java?at=master&fileviewer=file-view-default#NTriplesParser.java-500 it appears the RIO parser is probably to blame.

Should I raise this with Sesame?

Cheers,

Alex.

Michael Grove

unread,

May 9, 2016, 1:10:12 PM5/9/16

to stardog

Yeah, probably, If you look at NTriplesUtil, you can see that it definitely does not match what's in the spec.

Cheers,

Mike

[1] https://bitbucket.org/openrdf/sesame/src/69dcca77110d55618b10a62bcf85c2d6d8cff6d2/core/rio/ntriples/src/main/java/org/openrdf/rio/ntriples/NTriplesUtil.java?at=master&fileviewer=file-view-default#NTriplesUtil.java-469:488

Cheers,
Alex.

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Zachary Whitley

unread,

May 9, 2016, 1:40:43 PM5/9/16

to Stardog

You could also try setting preserve.bnode.ids [1] to false in the database and letting Stardog handle the bnode naming for you.

From the docs:

Determines how the Stardog parser handles bnode identifiers that may be present in RDF input. If this property is enabled (i.e., TRUE), parsing and data loading performance are improved; but the other effect is that if distinct input files use (randomly or intentionally) the same bnode identifier, that bnode will point to one and the same node in the database. If you have input files that use explicit bnode identifiers, and multiple files may use the same bnode identifiers, and you don’t want those bnodes to be smushed into a single node in the database, then this configuration option should be disabled (set to FALSE).

[1] http://docs.stardog.com/#_configuration_options

---
You received this message because you are subscribed to the Google Groups "Stardog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stardog+u...@clarkparsia.com.

Alex Tucker

unread,

May 10, 2016, 4:58:46 AM5/10/16

to sta...@clarkparsia.com

Thanks folks.

Digging a little more, it looks as though Sesame/RIO 2.7.x is based on the old RDF spec where only alphanumeric is allowed, while 2.8.x uses the new spec, see:

https://openrdf.atlassian.net/browse/SES-2151

I can't easily figure out what versions of what are used by Stardog.

I can work around the issue easily enough, but was just a bit surprised at Stardog not being compliant with the up-to-date RDF spec :)

Cheers,

Alex.

Michael Grove

unread,

May 10, 2016, 7:36:27 AM5/10/16

to stardog

On Tue, May 10, 2016 at 4:58 AM, Alex Tucker <al...@floop.org.uk> wrote:

Thanks folks.

Digging a little more, it looks as though Sesame/RIO 2.7.x is based on the old RDF spec where only alphanumeric is allowed, while 2.8.x uses the new spec, see:

https://openrdf.atlassian.net/browse/SES-2151

Right, I'm not sure why that ticket was closed as "Cant Reproduce", I used the exact data from the ticket and get a parse error, easily reproducing the bug. While they claim to support RDF 1.1, Rio's NTriplesUtil does not appear to have been updated correctly.

I can't easily figure out what versions of what are used by Stardog.

We're using 4.0 where the problem still persists as seen in their code we referenced in the earlier email.

I can work around the issue easily enough, but was just a bit surprised at Stardog not being compliant with the up-to-date RDF spec :)

Yep, we do our best here, but there's obviously at least one corner case that RIO doesn't handle well which we unfortunately inherit.