Dear all,
This maybe a little over specialised but I'm working on a project where I think the data we are building could be of use to some OpenRefine users, so I hope you don't mind me posting this.
GOKb, the Global Open Knowledgebase, is a community-managed project that aims to describe electronic journals and books, publisher packages, and platforms which host the resources. I’ve been working on the project since it started working with others to gather requirements, develop the underlying data models and specify functionality for the system. We use OpenRefine (with a specially designed extension) as our major mechanism of getting data into GOKb - exploiting the ability to clean up the data (which tends to come from publishers and can be of variable quality) and to re-apply changes to future data from the same publisher/supplier.
Several hundred ejournal packages, and associated information about the ejournal titles, platforms and organisations have been added to the knowledgebase over the past few months. OpenRefine is used to do much of the work to get data ready for loading into GOKb.
Alongside this work of adding content we have also opened up APIs to interact with the service, and we suspect that these could be useful to others using OpenRefine to work with data relating to journals. In particular the ‘Coreference service’ allows you to look up identifiers (such as ISSNs) and get back journal title information and other IDs associated with that title (as JSON or XML).
We are interested in:
* Talking to people who use OpenRefine and would like to make use of such a service
* If there is some interest, what support/documentation people would like to see
* Understanding if we can offer different/better services based on the GOKb data for OpenRefine (e.g. would different data GOKb has be of interest? Would a reconciliation service for journal titles? etc.)
The current APIs we support are:
The ‘Coreference’ service
The main aim of this API is to provide back a list of identifiers associated with a title. The service allows you to provide a journal identifier (such as an ISSN) and get back basic information about the journal including title and other identifiers associated with the journal (other ISSNs, DOIs, publisher identifiers etc.).
OAI Interfaces
The main aim of this API is to enable other services to obtain data from GOKb on an ongoing basis. Information about ejournal packages, titles and organisations can be obtained via this service
Add/Update API
This API supports adding and updating data in GOKb. You can add new, or update existing, Organisations and Platforms. You can add additional identifiers to Journal titles.
If anyone is already using GOKb data in OpenRefine, or is interested in using it, please do get in touch (
ow...@ostephens.com)
(I've been playing around with the SPARQL endpoint and the DERI RDF extension for OpenRefine to reconcile organisation names against the Organisation data we have in GOKb. This seems to work reasonably well overall, although we've noted problems with speed and low confidence scores even on exact string matches - we are doing some further investigation into this at the moment)
Owen