OAI-PMH - clarification on 'only unrestricted published datasets can be harvested'

26 views
Skip to first unread message

Janet McDougall - Australian Data Archive

unread,
Apr 11, 2019, 2:37:54 AM4/11/19
to Dataverse Users Community
Hi All
I am seeking some clarification on the harvesting feature as documented.  I set up a test OAI set last year for testing by a remote client (national data service).  ADA holds mostly restricted data files, but the metadata is still able to be harvested by the client.  

The user guides documentation says 'Only the published, unrestricted datasets in your Dataverse can be made harvestable.'   I am unaware of  what a "restricted dataset" is as described here.  The metadata is always available despite restricted or unrestricted access settings on individual data files within a dataset.  I may be misunderstanding the documentation...

Also, the doco says 
"Note that it is only the metadata that are harvested. Remote harvesters will generally not attempt to download the data files associated with the harvested datasets."  I am hoping that the data files are not available for metadata harvesting clients.


Thanks
Janet

Philip Durbin

unread,
Apr 11, 2019, 6:53:37 AM4/11/19
to dataverse...@googlegroups.com
Hi Janet,

You're right, in Dataverse, a dataset cannot be restricted. Only files can be restricted. As you say, when a dataset is published the metadata is always available.

Can you please open an issue at https://github.com/IQSS/dataverse/issues to indicate that this documentation should be revisited? If anyone out there would like to take a crack at editing that page, it can be found in the source tree at doc/sphinx-guides/source/admin/harvestserver.rst

As you are hoping, the data files themselves are not available for public download when they are restricted. The URLs to those restricted files are available in the metadata, but you would need an API token with the proper permissions to download them.

On a related note, there's a relatively new JVM option called `dataverse.files.hide-schema-dot-org-download-urls` that is related. It's documented at http://guides.dataverse.org/en/4.12/installation/config.html#dataverse-files-hide-schema-dot-org-download-urls . It was introduced over concern that search engines might cause a lot of load on the server by downloading lots of files at once.

I hope this helps,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f3b02542-94b7-4c55-965a-e58ee99ba5f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Janet McDougall - Australian Data Archive

unread,
Apr 11, 2019, 9:27:24 PM4/11/19
to Dataverse Users Community

Hi Phil
Thanks for your response and extra info, it is as I expected but just wanted to clarify.  Will open issue to have the doco revised.
Janet

Philip Durbin

unread,
May 13, 2019, 5:55:02 AM5/13/19
to dataverse...@googlegroups.com
Thanks for opening https://github.com/IQSS/dataverse/issues/5840 to fix the documentation.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages