Hi Manual,
Dataverse is also able to harvest from repositories without needing to specify a harvesting set, in which case all published datasets are harvested. Harvard Dataverse harvests from 5 other Dataverse-based repositories, such as
DataverseNL, without specifying a set.
So if we use the "harvesting_sets" column in the data that Phil mentioned to see which Dataverse repositories make their datasets harvestable, we miss these cases where repositories have enabled harvesting and haven't created harvesting sets.
About testing harvesting: Harvard Dataverse tries to harvest from 21 other repositories that use Dataverse and we've been working through technical issues that prevent the repository from harvesting many or all of the datasets from about half of those repositories.
And we've also been considering policies to help us mitigate these technical issues, such as harvesting less rich metadata formats, like oai_dc, from repositories that are using less-recent versions of Dataverse.
These policy discussions reminded me of cases where managers of Harvard Dataverse either entered into more formal agreements to harvest metadata, like the
Data-PASS project, or were encouraged to enter into a more formal agreement. For example, the folks who manage
the Survey Research Data Archive asked us about signing a memorandum of understanding where we would agree to maintain the technologies that let us share each other's metadata. We thought that wasn't necessary, but I've started to see the merit in it.
Julian
Julian Gautier (he/him)
Product Research Specialist,
IQSS