Thanks for sharing the proposal for designing and implementing support for research objects in Dataverse! I have a couple of comments and questions:
1. TerminologyI think we should consider alternative names for Dataset Type, e.g., Research Object Type, Resource Type or Digital Object Type. I suggest Research Object Type.
2. One single PIDThe proposal implies that each dataset ideally should only contain objects of the same Research Object Type. So, if your study involves data, workflows, and software/code, you should archive these in (at least) three different datasets, one for the data, one for the workflows, and one for the software/code. This means you in the end will have (at least) three different PIDs. Of course, links can be established between these datasets using, e.g., the Related Publication metadata field. However, I don't understand how this multiple PID approach aligns with the statement in the Purpose section of the proposal: "This proposal describes a process for designing and implementing support for research objects in Dataverse software. Research objects are mechanisms for associating "related resources about a scientific investigation so that they can be shared using a single identifier." (Wikipedia)" In the example above, would this mean a fourth object would need to be created to link the three datasets together?
3. Levels of descriptionEssential parts of the discussion in the proposal, in particular about licensing, demonstrate the need to have support at file level for richer metadata and terms of use / license (see most recently this
discussion in the Dataverse Google group). This makes me think of whether the discussion about support for multiple Research Object Types should start at the file level. At file level, there is less doubt as to which Research Object Type (data, software/code, workflow, ...) a given file represents and what Terms of Use that should apply. This would suggest that information about Research Object Type and license should be applied at file level. Actually, this level could be called
Research Object level. At the next level, what currently is called dataset, multiple files can be collected which may be of different research object types (e.g., documentation, data and workflow) and have different Terms of Use. At this level, the metadata and licensing information could simply be an aggregation of the file level information in addition to the metadata and license of the documentation file(s) documenting the contained files collectively. We could call the object at this level
Research Object collection. This level could be used to cover the use case described above, i.e., using one PID to share a collection of different Research Object Types. Yet one level up, we have
repository collections, also called sub-dataverses (in some Dataverse installations, they might have the status of repositories). This level is optional. At the top level, we have the
repository. The figure below summarizes the four levels:
Repository
Ā (--- Repository collection)
Ā Ā Ā --- Research Object collection (documentation, Research Objects)
Ā Ā Ā Ā --- Research Object (data, software/code, workflow, ...)
I'd be happy to discuss this further with the proposal team and the larger community.
Best, Philipp