Bulk exporting datasets from Pure and importing to Eprints

Leigh Stork

unread,

Oct 16, 2024, 9:19:10 AM10/16/24

to EPrints UK User Group

Hello!

At Strathclyde we're in the process of setting up an Eprints data repository as a replacement for the current workflow where research data is depsoited in Pure.

We're exploring options of how to get both the metadata and files out of Pure, with one suggestion being that we use the Pure API. However, Pure support have advised that Eprints doesn't understand the Pure API- so it's not straightforward export from one and ingest to the other.

Has anyone had any experience with the Pure API to Eprints and could offer any suggestions or alternatives?

Thanks!

Leigh Stork

Open Repositories Manager

University of Strathclyde

Alan Exelby (LIB - Staff)

unread,

Oct 18, 2024, 2:21:51 AM10/18/24

to Leigh Stork, EPrints UK User Group, Alan Exelby (LIB - Staff)

Leigh,

We have the same Pure/E-prints relationship, functioning for many years, revised 2022 when our E-prints moved to hosted; everything in E-prints is from Pure, except theses which are entered ‘manually’ by Library staff, but not everything in Pure goes to E-prints. Our system works by sending files; I do not have any direct contact with the process, which is under the control of our brilliant SA for Pure, but he is very busy at the moment so I hesitate asking if I can put you in contact with him, however here are the most relevant notes that we have summarising the process (written 2022 and slightly tweaked for clarity):

1. There is an automated job in Pure (the ‘Preserved Content Update’) which pushes metadata and (where present) associated files from Pure to Eprints. This runs on an hourly basis. The job only actually records a log once per day, and that log couldn’t be more lacking in detail (it basically just says ‘the job ran successfully’).

2. In a separate part of Pure we can control the parameters of that feed – e.g. the API that Pure uses to connect to Eprints, the Mods it uses when converting the metadata to xml, and a few other odds and ends. We don’t edit this stuff – we’ve always left it to Elsevier and/or ITCS.

3. To get a view of what is actually happening we have to switch to Eprints. There we can see the record of actual data which has come in from Pure for each item….

(The third point, where I have omitted specific details, is about seeing the outcome in the History tab of a record within E-prints, it doesn’t really say anything about the process.)

I do know that the content of what is put in the file for E-prints is controlled by what is called a ‘MODS file’ in Pure, which always looks like an XSL style sheet to me, but I gather is not technically a file; I could pass on more about that from our documentation if that would help.

If this doesn’t explain, I will have to find out if our Pure SA can help…

Best wishes,

Alan

==============================
Mr A.V. Exelby,
Systems Manager.
The Library,
University of East Anglia,
Norwich Research Park,
Norwich, NR4 7TJ

Tel.: 01603 592432 (mobile 07736 093516, but only in office hours and landline always preferable)
E-mail: a.ex...@uea.ac.uk

From 1.8.23 to 31.7.24, working normally Monday to Thursday
================================
"Man, who'd have thought being a librarian could be so tough"
Seamus Harper, in 'Harper 2.0', "Andromeda".

From: eprints-uk...@googlegroups.com <eprints-uk...@googlegroups.com> On Behalf Of Leigh Stork
Sent: Wednesday, October 16, 2024 2:19 PM
To: EPrints UK User Group <eprints-uk...@googlegroups.com>
Subject: Bulk exporting datasets from Pure and importing to Eprints

Warning: This email is from outside the UEA system. Do not click on links or attachments unless you expect them from the sender and know the content is safe.

--
You received this message because you are subscribed to the Google Groups "EPrints UK User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eprints-uk-user-...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/eprints-uk-user-group/cd0facd9-14fe-40d5-8945-f4a08dedda28n%40googlegroups.com.

Alan Exelby (LIB - Staff)

unread,

Oct 18, 2024, 2:21:55 AM10/18/24

to Leigh Stork, EPrints UK User Group, Alan Exelby (LIB - Staff)

Sorry, realised after sending: you are asking about ‘research data’, we only have experience with the usual records + text files; we looked briefly at datasets earlier this year but could not act on it due to the substantial costs of increased storage in E-prints.

Alan

==============================
Mr A.V. Exelby,
Systems Manager.
The Library,
University of East Anglia,
Norwich Research Park,
Norwich, NR4 7TJ

Tel.: 01603 592432 (mobile 07736 093516, but only in office hours and landline always preferable)
E-mail: a.ex...@uea.ac.uk

From 1.8.23 to 31.7.24, working normally Monday to Thursday
================================
"Man, who'd have thought being a librarian could be so tough"
Seamus Harper, in 'Harper 2.0', "Andromeda".

John Salter

unread,

Oct 22, 2024, 4:15:42 AM10/22/24

to Leigh Stork, EPrints UK User Group, Alan Exelby (LIB - Staff)

Hi Leigh,
I haven't used it the Pure API in anger, but I don't think it would be too tricky to create an EPrints import format for the Pure dataset metadata.

The newer Pure API has extensive documentation available at https://[your_pure_instance]/ws/api/, which includes the schema for a dataset record and an example record.

The data is returned in a JSON format.

The latest version of EPrints doesn't include any JSON-based import formats, although it is possible (see: https://files.eprints.org/753/ - could be used as a starting point part of a solution).

The key would be the mapping between Pure fields and your EPrint fields.

It sounds like you are creating a new repository for datasets – so you could base your EPrints schema around the data available within the Pure API.

Happy to provide some more musings if you'd like.

One thing worth mentioning, not just in relation to datasets, is that Pure seems to be moving towards the use of the record 'uuid' identifier rather than the older PureID.
Storing the UUID in the EPrint records would be useful for datasets and research outputs.

Cheers,
John

PS Whilst Elsevier's statement that 'EPrints doesn't understand the Pure API' is accurate, the same can be said about most other systems!

From: 'Alan Exelby (LIB - Staff)' via EPrints UK User Group <eprints-uk...@googlegroups.com>
Sent: Wednesday, October 16, 2024 3:08 PM
To: Leigh Stork <leigh...@gmail.com>; EPrints UK User Group <eprints-uk...@googlegroups.com>
Cc: Alan Exelby (LIB - Staff) <A.Ex...@uea.ac.uk>
Subject: RE: Bulk exporting datasets from Pure and importing to Eprints

CAUTION: External Message. Use caution opening links and attachments.

To view this discussion on the web, visit https://groups.google.com/d/msgid/eprints-uk-user-group/LO2P302MB01083FC54D55581A7807B077D7462%40LO2P302MB0108.GBRP302.PROD.OUTLOOK.COM.

Reply all

Reply to author

Forward