Reconciliation service from RDF Extension

40 views
Skip to first unread message

bpen...@gmail.com

unread,
Jun 25, 2022, 6:00:49 PM6/25/22
to OpenRefine
Hello,

I have the following situation: a SPARQL endpoint with a graph of cities and their properties (including a unique identifier) and a CSV with the city identifiers (the identifiers are the same). 

I would like to reconcile the CSV file with the SPARQL endpoint using the city id so that I can add the graph data to my original CSV. 

Is it possible, this sort of join between the CSV and the graph?

I am using Open Refine 3.5.1 (windows 8.1) and RDF Extension 1.3.1 installed.

Example: 

CSV structure:
ID
1100015

RDF triples:
     rdfs:label "São Paulo" ;
     :id 1100015 .

Best,
Bruno

Keven L. Ates

unread,
Jun 26, 2022, 7:58:02 PM6/26/22
to OpenRefine
See the RDF Transform extension (shameless advertisement).  You will need Java 11.or better.  Using Open Refine 3.6 Beta2 provides that.  Anyway...

Either extension transforms the data to an RDF format, i.e., converting the CSV to a graph.  I'm assuming your SPARQL service for reconciliation is your local service as opposed to a remote SPARQL endpoint but I'll show both.

I would do this by simply converting the CSV to RDF.  The first and third triple can be created directly from the CSV if you are satisfied about the truth of the CSV ID type.  To get the labels, you can load the RDF file in your own "RDF graph system of choice" and do a SPARQL construct or insert to resolve the label against the id using a SERVICE keyword.  If you are unsure of the type (first triple), you can resolve that as well with SPARQL directly.

If the reconcile is by your localhost, then do the simplest conversion possible for the CSV and resolve everything else with the local SPARQL endpoint.  For the ID column, create a root subject IRI using GREL with "http://localhost:2020/resource/municipios/" + value and a triple with a constant ":id" property and the ID as a Literal value as object.  Load it in your local service as a separate graph (like ".../my_municipios" shown below) and use the SPARQL pattern:
    INSERT { GRAPH <graph_to> { triple_patterns_to } }
    WHERE { GRAPH <graph_from> { triple_patterns_from } }

For example:
    INSERT {
      GRAPH cities:my_municipios {
        ?city a ?theType ;
              rdfs:label ?theLabel .
      } }
    WHERE {
      GRAPH cities:my_municipios {  <-- the new loaded graph
        ?city :id ?id
      }
      GRAPH cities:municipios {
        ?city a ?theType ;
              rdfs:label ?theLabel .
      } }
But this is just copying triple which is not necessary.  Or for a remote site:
    INSERT {
      GRAPH cities:my_municipios {
        ?city a ?theType ;
              rdfs:label ?theLabel .
      } }
    WHERE {
      GRAPH cities:my_municipios {  <-- the new loaded graph
        ?city :id ?id
      }
      SERVICE http://remote/sparql/site {
        ?remCity a ?theType ;
              rdfs:label ?theLabel .
        bind ( IRI( concat("http://remote/city/prefix/", ?id) ) as ?remCity )
      } }

YMMV.  I suppose I could make RDF Transform export a CSV of the RDF but that's rather odd.

If you end up copying triples, avoid it and just query the same when needed.  You should be able to use your service to export a graph or query results as a CSV if needed.  If you just need a complete CSV from the local service, skip all the above and just create the CSV export directly.

Reply all
Reply to author
Forward
0 new messages