Source of Journal metadata for potential use with OpenRefine

81 views
Skip to first unread message

Owen Stephens

unread,
Jun 8, 2015, 7:16:12 AM6/8/15
to openr...@googlegroups.com
Dear all,

This maybe a little over specialised but I'm working on a project where I think the data we are building could be of use to some OpenRefine users, so I hope you don't mind me posting this.

GOKb, the Global Open Knowledgebase, is a community-managed project that aims to describe electronic journals and books, publisher packages, and platforms which host the resources. I’ve been working on the project since it started working with others to gather requirements, develop the underlying data models and specify functionality for the system. We use OpenRefine (with a specially designed extension) as our major mechanism of getting data into GOKb - exploiting the ability to clean up the data (which tends to come from publishers and can be of variable quality) and to re-apply changes to future data from the same publisher/supplier.

GOKb opened to ‘public preview’ in January 2015, and you can signup for an account and access the service at https://gokb.kuali.org/gokb/

Several hundred ejournal packages, and associated information about the ejournal titles, platforms and organisations have been added to the knowledgebase over the past few months. OpenRefine is used to do much of the work to get data ready for loading into GOKb.

Alongside this work of adding content we have also opened up APIs to interact with the service, and we suspect that these could be useful to others using OpenRefine to work with data relating to journals. In particular the ‘Coreference service’ allows you to look up identifiers (such as ISSNs) and get back journal title information and other IDs associated with that title (as JSON or XML).

We are interested in:

* Talking to people who use OpenRefine and would like to make use of such a service
* If there is some interest, what support/documentation people would like to see
* Understanding if we can offer different/better services based on the GOKb data for OpenRefine (e.g. would different data GOKb has be of interest? Would a reconciliation service for journal titles? etc.)

The current APIs we support are:

The ‘Coreference’ service
The main aim of this API is to provide back a list of identifiers associated with a title. The service allows you to provide a journal identifier (such as an ISSN) and get back basic information about the journal including title and other identifiers associated with the journal (other ISSNs, DOIs, publisher identifiers etc.). 


OAI Interfaces
The main aim of this API is to enable other services to obtain data from GOKb on an ongoing basis. Information about ejournal packages, titles and organisations can be obtained via this service


Add/Update API
This API supports adding and updating data in GOKb. You can add new, or update existing, Organisations and Platforms. You can add additional identifiers to Journal titles.


We also have a SPARQL endpoint available on our test service (which contains test data only). The SPARQL endpoint is at http://test-gokb.kuali.org/sparql, and a set of example queries are given at https://github.com/k-int/gokb-phase1/wiki/Sample-SPARQL.

If anyone is already using GOKb data in OpenRefine, or is interested in using it, please do get in touch (ow...@ostephens.com)
 (I've been playing around with the SPARQL endpoint and the DERI RDF extension for OpenRefine to reconcile organisation names against the Organisation data we have in GOKb. This seems to work reasonably well overall, although we've noted problems with speed and low confidence scores even on exact string matches - we are doing some further investigation into this at the moment)

Owen

Thad Guidry

unread,
Jun 8, 2015, 11:45:10 AM6/8/15
to openrefine

 (I've been playing around with the SPARQL endpoint and the DERI RDF extension for OpenRefine to reconcile organisation names against the Organisation data we have in GOKb. This seems to work reasonably well overall, although we've noted problems with speed and low confidence scores even on exact string matches - we are doing some further investigation into this at the moment)

Owen

​Owen, it might not be 100% exact strings. (encoding issues, nbsp's, etc)

unicode(value)

Here's a few recipes to help with inspection:

​​
Thad

Owen Stephens

unread,
Jun 11, 2015, 5:21:07 AM6/11/15
to openr...@googlegroups.com
Thanks Thad - I'll have a look at those

Tom Morris

unread,
Jun 17, 2015, 1:36:14 PM6/17/15
to openr...@googlegroups.com
That sounds interesting, Owen. I've forwarded it to a list of data
savvy librarians at Harvard and neighboring institutions. You might
also want to try other librarian oriented venues such as code4lib.

On Mon, Jun 8, 2015 at 7:16 AM, Owen Stephens <ow...@ostephens.com> wrote:

> GOKb opened to ‘public preview’ in January 2015, and you can signup for an
> account and access the service at https://gokb.kuali.org/gokb/

Chrome is unhappy with the kuali.org SSL cert for some reason. It's
mostly a cosmetic issue (it displays a red X'd padlock even though it
says the cert is valid), but might be worth looking into when you have
time.

This seems to have been renamed since your post:

> Documentation:
> https://github.com/k-int/gokb-phase1/wiki/Integration---Telling-GOKb-about-new-or-corresponding-resources-and-local-identifiers

It's now:

https://github.com/k-int/gokb-phase1/wiki/Integration-APIs:-Telling-GOKb-about-new-or-corresponding-resources-and-local-identifiers

> (I've been playing around with the SPARQL endpoint and the DERI RDF
> extension for OpenRefine to reconcile organisation names against the
> Organisation data we have in GOKb. This seems to work reasonably well
> overall, although we've noted problems with speed and low confidence scores
> even on exact string matches - we are doing some further investigation into
> this at the moment)

I'd be interested in hearing the results of your investigation. I've
noticed scoring anomalies in the past, but never had the time to look
into them in detail.

Owen Stephens

unread,
Jun 18, 2015, 2:07:34 AM6/18/15
to openr...@googlegroups.com
Thanks very much Tom

I've posted a similar message to Code4Lib, and thanks for posting to the DST4L list.

I'll pass on the cert issue on the Kuali.org domain and if we get anywhere with the scoring anomalies issue I'll report back

Best wishes

Owen

Kathryn Knight

unread,
May 4, 2016, 11:59:01 AM5/4/16
to OpenRefine
Hello, 

I'm trying to make use of the GOKb Extension for OpenRefine. I've tried to download this file: https://gokb.k-int.com/gokb/api/downloadUpdate (per instructions on this site: https://openlibraryenvironment.atlassian.net/wiki/display/OLE/Install+the+GOKb+Extension), however, I get a sad "service unavailable" message.

Alternatively, I found the files via the gokb-phase-1 github site, but it looks like I need to do some extra fanciness to get the extension to work in OpenRefine....I'm a far cry from a developer, so if this is in fact what I need to do, I would love some guidance. Or, if there's another place I might look to make use of GOKb in OpenRefine, I'd love to hear about it.

Thanks!

Katie

Owen Stephens

unread,
May 5, 2016, 2:03:17 PM5/5/16
to OpenRefine
Hi Katie

If you want to use GOKb to look up journal data, you don't need to use the GOKb extension,  but can use one of the GOKb APIs (an interface to GOKb designed for software to interact with) to lookup and retrieve data, and bring it into OpenRefine.

The GOKb Extension is if you want to contribute data to the GOKb system in which case you'll need to get in touch with the GOKb editor team to find out if this is possible.

Can you say a bit more about what you are hoping to do, and I can post instructions or send you the contact information for the relevant people at GOKb

Best wishes

Owen
Reply all
Reply to author
Forward
0 new messages