Link Datasets across Dataverse installation

37 views
Skip to first unread message

Sebastian Karcher

unread,
Jul 17, 2019, 5:04:19 PM7/17/19
to Dataverse Users Community
Here's the motivation: We're working with a group that deposits all their data in the Harvard Dataverse. We're helping with the curation of some of their projects and would like to see those datasets displayed in QDR. We'd like to avoid duplicating them.

Idea: Dataverse already has the ability to show linked datasets via harvesting, such as https://dataverse.harvard.edu/dataverse/qdr . As I understand it, this only works, though if the specific subset of projects are specified in a harvesting server, which in turn requires superuser status.
I'd like to be able to specify a dataset for linking without the need to create a harvesting set on the source dataverse. Since the data are all open anyway, I don't think there are any structural reasons this shouldn't work, but is this actually possible? If so, how? If not, how difficult would it be to allow this?

Thanks!
Sebastian

Julian Gautier

unread,
Jul 18, 2019, 11:11:20 PM7/18/19
to Dataverse Users Community
Hey Sebastian. I don't think this is possible now. Issue https://github.com/IQSS/dataverse/issues/5402 seems very related.

I was emailing someone else recently who wants to do the same thing, and I wondered if the following would be a workaround that's a little scalable (but not much):
  • An admin of a Dataverse repository, Receiving Dataverse, could create a dataverse in the repository that stores the datasets of interest, Source Dataverse.
  • That person would use the dataset linking feature to collect the datasets she wants into the dataverse she's created on Source Dataverse.
  • Then admins of Source Dataverse would create an OAI-PMH set that includes the linked datasets in that dataverse
  • And the admin of the Receiving Dataverse would set up a harvest schedule to harvest the dataset metadata in that OAI-PMH set
This gives the person who owns the dataverse in the Source Dataverse control over which of its datasets get harvested.

Some blockers or drawbacks:
  • I don't know if in Dataverse it's possible to create an OAI-PMH set of datasets that are linked in a dataverse. Can't think of a query that would do that. If anyone can, let me know :)
  • This assumes that admins of the Source Repository will allow people to publish dataverses and use the dataset linking feature to add datasets. Not all Dataverse repositories allow people to publish dataverses.
  • There's still overhead for both admins of the repositories - one has to create a harvesting set and the other has to create a harvesting client to harvest the datasets in the harvesting set - so it's less scalable
Reply all
Reply to author
Forward
0 new messages