Import API with PID ignores restricted files in JSON

51 views
Skip to first unread message

Kaitlin Newson

unread,
Mar 18, 2021, 10:39:18 AM3/18/21
to Dataverse Users Community
Hi DV Community,

I'm running some test imports with the API using this API (DV version 5.1.1) for importing with PIDs, and am noticing the API seems to ignore that there are files restricted in the metadata. Does anyone know if this is an expected behaviour? I assumed it would work since the example in the docs has this in the metadata.

The files with their storage IDs are on the server, and the datasets are being imported with "release=no".

Here's an example of the files section in the metadata JSON being imported:

"files": [
   {
   "description": "Data for nests monitored in Bashaw study area  in 2015",
   "label": "Bashaw2015_nests.tab",
   "restricted": true,
   "version": 3,
   "datasetVersionId": 1435,
  ....

Julian Gautier

unread,
Mar 18, 2021, 11:44:05 AM3/18/21
to Dataverse Users Community
Hi Kaitlin,

I relied on that bit of information in the Dataverse JSON recently and had to learn a bit about when and how it was added. Hope this is helpful:

That bit of information in the dataset export about each files' restriction status was added in Dataverse version 4.8 (https://github.com/IQSS/dataverse/issues/4645). My understanding from that GitHub issue and PR was that the addition to the JSON was only informational. A Dataverse repository can't restrict a file based on if the value of that "restricted" key is "true", although this makes sense to me and worth a feature request.

It looks like an endpoint for restricting a file was added in the same Dataverse version: https://github.com/IQSS/dataverse/issues/3873.


Julian Gautier

unread,
Mar 18, 2021, 11:49:39 AM3/18/21
to Dataverse Users Community
In the meantime, would it be better if the JSON example on that page of the guides didn't include any information that a Dataverse repository actually ignores, like the "restricted" status? Or in other words, should the JSON example include only information that would be used? Are the "version" and "datasetVersionId" keys also ignored?

Jim Myers

unread,
Mar 18, 2021, 12:47:15 PM3/18/21
to Dataverse Users Community
Another potential workaround - rather than adding the files in the import call, add them through the Dataset add file API, using the 'direct upload' variant to specify the existing location of the file rather than uploading the bytes there. The parameters for that call do let you set the restrict flag and other metadata. Further, while that API is currently one file at a time, the Scholar's Portal work on Globus integration is extending that API so you can submit metadata for multiple files at once. With that, you'd just do an import and then an add-all-files call, so only two instead of one API call.

-- Jim

Kaitlin Newson

unread,
Mar 18, 2021, 1:45:28 PM3/18/21
to Dataverse Users Community
Is it possible to use the import dataset API (with PIDs) but then add the files separately? I assumed based on the docs the files had to be on the filesystem in the proper location with their storage identifiers.

James Myers

unread,
Mar 18, 2021, 5:37:02 PM3/18/21
to dataverse...@googlegroups.com

That was my suggestion but I hadn’t tried it. Looking a bit further, I think my suggestion may only work if the imported PID is one Dataverse can manage (i.e. Dataverse is configured to have access for the protocol/authority/shoulder you’re using.)

 

TLDR: So what I suggested is probably not a viable work-around unless you have access to manage the imported PIDs.

 

Details:

 

In terms of the files themselves - with import, I think the requirement to preposition the files only applies if you are setting the release=true flag (such that the dataset is immediately published and you can’t add files later). I think adding files in import was originally  a convenience as well – there was no other mechanism to move the files yourself and then update the Dataset to reference then without editing the database. However, now the direct upload mechanism can be used the same way.  

 

The issue I see is that calling the publish api, after doing an import with no files (release=no) and using the direct-upload option of the add file api to register the files you’ve previously placed, results in a difference from using release=yes. The files aren’t the issue and I expect the direct-upload option to work, but I think the normal publish command will try to update the metadata for your imported PID and fail if it can’t. The :import release=yes mechanism disables trying to contact the PID provider. If you can configure Dataverse to update the PID, I think my suggested route ending with publish would work (and would update the DOI at DataCite to point to your new landing page.).

 

FWIW: PR #7504 adds another migration mechanism that can preserve the original publication date and version number as well. It is not yet accepted/merged though, and it still was designed for the case where you can update the PID/DOI.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/4e9633fe-242f-4dba-945b-ef56fc707045n%40googlegroups.com.

Stefan Kasberger

unread,
Mar 19, 2021, 8:17:39 AM3/19/21
to Dataverse Users Community
Maybe you have to use "restrict" instead of "restricted" for the request. The API has some different behaviour on this. When I remember correctly, it likes to get the field as "restrict", but returns the state it in request as "restricted".
Reply all
Reply to author
Forward
0 new messages