Identity of Blank Nodes

Yutaka Mitsuishi

unread,

Dec 10, 2014, 12:34:16 PM12/10/14

to web-data...@googlegroups.com

I have found that identical blank nodes are used in different contexts. For example, in "ccrdf.html-microdata.0.nq.gz", "_:node0516c3ec212ede18cd3eca711b9ac4" are used in two contexts. The blank node refers to an airport in one context, and to an article in another context. They are apparently different.

Is this correct? "2.4 RDF Blank Nodes" in the W3C document about N-Quads [1] describes "A fresh RDF blank node is allocated for each unique blank node label in a document. Repeated use of the same blank node label identifies the same RDF blank node." The example there seems incompatible with the data I mentioned.

[1] "RDF-1.1 N-Quads", http://www.w3.org/TR/n-quads/

--

Yutaka Mitsuishi

Robert Meusel

unread,

Dec 11, 2014, 11:42:38 AM12/11/14

to web-data...@googlegroups.com

Hello Yutaka,

Thanks for pointing this out. As we are using a distributed Framework to extract the data in parallel, it might happen, that for different URLs we receive from Any23 the same BlankNode identifier.

So, each blankNode is only valid and unique for one URL, not over the different URLs.

A quickfix would be to append the URL to the BlankNode (than everything is fine).

For our next extraction we will take this in mind and try to generate only unique BlankNodes over the whole document.

Cheers,

Robert

Yutaka Mitsuishi

unread,

Dec 22, 2014, 4:00:41 PM12/22/14

to web-data...@googlegroups.com

Hi Robert,

Thank you very much for you reply and I am sorry for this late reply.

I am going to use a quick fix you recommend or another.

Best regards,

Yutaka

2014年12月11日木曜日 16時42分38秒 UTC Robert Meusel:

Reply all

Reply to author

Forward