Getty vocabularies. LODrefine

92 views
Skip to first unread message

Jennifer B Young

unread,
May 9, 2018, 7:46:23 PM5/9/18
to openr...@googlegroups.com

Hi

 

I’m trying to set up a reconciliation service for the Getty vocabularies as documented here: http://vocab.getty.edu/queries#OpenRefine_Reconciliation_Service. However, I only have the RDF extension installed. I’m trying to install/use LODrefine and I am stymied. I’ve cloned the repo locally but I don’t see any executable file. Also, the links are broken, same for the Deri links. I feel that I am missing something obvious – thanks for any help!  I am on a 64-bit Windows 10 machine if that matters.

 

-Jennifer

 

 

Jennifer B. Young

Metadata Coordinator

Northwestern University Libraries

Northwestern University

www.library.northwestern.edu

j-yo...@northwestern.edu

847.491.8978

 

Ettore Rizza

unread,
May 10, 2018, 5:20:59 AM5/10/18
to OpenRefine
Hi Jennifer,

if you do not see an exe, you probably downloaded the Linux version. The Windows version is here (click on the zip file, not on the tar.gz (Linux) or on the dmg (MacOSX). 

LODrefine is an older version of OpenRefine. Its main advantage is that it is always compatible with the DERI's RDF extension (outdated). But in the case of the Getty Vocabulary, your link says that its SPARQL endpoint can NOT be queryied with the RDF extension. 

The DERI extension includes a "SPARQL full-text search-based Reconciliation" that unfortunately cannot be used, because there's no way to specify that the luc:term index should be used (see issue/33). Nevertheless, one can use the GVP SPARQL service by querying for a fixed label (similar to Find Subject by Exact English PrefLabel), getting JSON format and parsing the result.

The workaround they describe can be applied with Open Refine 2.8. No need for LODrefine so.

Hope this helps,

Ettore

Stuart Kenny

unread,
May 10, 2018, 6:57:07 AM5/10/18
to OpenRefine
Hi, in case it helps these builds of a recent version of the OpenRefine master branch include the RDF extension. The .zip is the Windows build.


Regards,
Stuart.

Ettore Rizza

unread,
May 10, 2018, 8:00:32 AM5/10/18
to OpenRefine
@Stuart : You've just released a new version ? Thanks, it's pretty useful for DBpedia reconciliation ! 

FYI, Open Refine 3.0 will probably be released on the next few weeks.

Stuart Kenny

unread,
May 10, 2018, 9:42:01 AM5/10/18
to OpenRefine


On Thursday, 10 May 2018 13:00:32 UTC+1, Ettore Rizza wrote:
@Stuart : You've just released a new version ? Thanks, it's pretty useful for DBpedia reconciliation ! 

Yes, needed to pull in an OpenRefine bug fix. I didn't need to fork and modify OpenRefine this time, so if you are using the OpenRefine master branch you could just take the extension from here instead:

Jennifer B Young

unread,
May 11, 2018, 7:14:22 PM5/11/18
to openr...@googlegroups.com

Thanks for your responses – I was misreading the instructions and assuming I needed the LODrefine installed. I have the Openrefine with the RDF extension already installed.

 

However, the SPARQL query in the instructions doesn’t seem to work or I’m not following it correctly. I added a column by fetching URLS based on column:

 

 

All I get back is a json file containing this:

 

 

Any help most appreciated!

 

Thank you!

Owen Stephens

unread,
May 11, 2018, 10:16:45 PM5/11/18
to OpenRefine
I think there are a couple of problems I think.

1. To search english headings you need to change the part of the expression reading '@nl' - this sets the language of the labels to search as dutch - but you need english - so this needs to be '@en' instead
2. This search is looking for an exact match and I think it is case sensitive. The heading "Picture postcards" doesn't exist, but 'picture postcard' or 'picture postcards' will work

So if you try:


You should find it successfully finds one heading with the JSON:

{
  "head" : {
    "vars" : [ "x" ]
  },
  "results" : {
    "bindings" : [ {
      "x" : {
        "type" : "uri",
        "value" : "http://vocab.getty.edu/aat/300026819"
      }
    } ]
  }
}

Good luck!

Owen

Ettore Rizza

unread,
May 11, 2018, 10:28:59 PM5/11/18
to OpenRefine
You can also use their webservice instead of Sparql. It's easier to use. It return an XML file that you can parse with OpenRefine.

1 Create the URLs

"http://vocabsservices.getty.edu/AATService.asmx/AATGetTermMatch?term=" + value.escape('url') + "&logop=and&notes="

2 Fetch the URLs in a new column named XML.

3 Parse the XML column

   3.1 Adding a new column named "ID"

forEach(value.parseHtml().select("Subject_ID"), e, e.ownText()).join('||')

   3.2 Adding a new column named "Term"

forEach(value.parseHtml().select("Term"), e, e.ownText()).join('||')

4 Split both new columns create in point 3 using "Split Multivalued cells", and using || as separator.

5 Find a method to select the best result among the propositions
Reply all
Reply to author
Forward
0 new messages