Hello
Two weeks ago we had datacite strategy brainstorming session.
Here is a bit more information about one idea, that I perhaps was not able to clarify enough
in that meeting.
The idea, or a vision, is that in future Datacite should support machine actionable PID metadata.
I do not know is that exactly correctly name for idea, but the basic idea is given below.
Why ?
Current metadata schema is mostly defined for humans, not for machines.
Vision:
Metadata schema should support recreating dataset
a) with the help of source datasets and
b) workflow description.
What would be needed ?
In future, in my DOI metadata I would like to be able to define a direct link to source dataset,
link to landing page is not enough.
In addition, I would like to define a processing workflow. This workflow could perhaps be
in the same dataset, but I need to have a direct link also to that workflow file.
Then - the difficult part.
After having a direct link to workflow file, using it and workflow attributes, I should be able
to select/find environment, that can process/recreate environment, that can run my workflow.
This approach would greatly enhance data provenance.
The difficult part:
I understand that this is not easy, because, e.g. in principle, in optimal scenario
services should be able to respond in a similar way than they responded before.
This is perhaps almost possible in some cases in cloud environments, but it is
a rather demanding requirement for all generic services.
I hope that this mail was able to clarify what I ment by "machine actionable PIDs".