Merged blank node identifiers

I know they are in different graphs, but that's just a workaround for those writing queries.. blank nodes are the same node across graphs within a dataset -- each .nq file represents a (sub)dataset.

https://www.w3.org/TR/rdf11-concepts/#section-dataset

> Blank nodes can be shared between graphs in an RDF dataset.

Robert Meusel

unread,

Jan 21, 2017, 11:58:46 AM1/21/17

to Web Data Commons

Hi Stian,

Your observation is totally right. This issue is based on how the extraction is done (in parallel) and exists for the different kinds of structured data - not only for embedded json ld. And also for the former extractions.

In order to make the blanknodes unique across and within the files, you need to combine them with the URL (fourth part of the quad).

I hope this helps,

Robert

Stian Soiland-Reyes

unread,

Jan 21, 2017, 3:19:46 PM1/21/17

to web-data...@googlegroups.com

Thanks, presumably this will be corrected for the next release..?

Perhaps an improvement to the extraction is to embed something like a UUID per parsed document in the bnode identifier? You could do UUID v3/v5 hash of the graph uri so you get consistent bnode identifiers from the same document.

(As a side note this would also allow you to arbitrarily split/join the NQ files as long as you keep graph statements gathered)

Btw, what happens if the source document uses a named JSON-LD @graph?

--
You received this message because you are subscribed to the Google Groups "Web Data Commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-data-commons+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Meusel

unread,

Jan 22, 2017, 5:54:12 AM1/22/17

to Web Data Commons, soilan...@cs.manchester.ac.uk

Might be something which we should consider in the next extraction, which should not hinder you in applying the workaround just discussed. In case you want to contribute, you can always create a branch and do a pull request in the repo. The code is freely available.

In case named JSON-LD Graph is used, e.g.:

{

"@context": "http://json-ld.org/contexts/person.jsonld",

"@id": "http://dbpedia.org/resource/Robert_Millar",

"@type": "Person",

"name": "Robert Millar"

}

The context is resolved and the deposited vocabulary is used to create the quads, e.g.: http://xmlns.com/foaf/0.1/Person

Cheers,

Robert

To unsubscribe from this group and stop receiving emails from it, send an email to web-data-commo...@googlegroups.com.

Reply all

Reply to author

Forward