--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/af2a74cf-1567-4746-8ce5-8d9e62ce7094%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Janet,
A few more notes…
-- Jim
As Phil says, the core workflow mechanism itself is quite general. It can include multiple steps and even make calls out to other web services and then pause/wait for a response before continuing. I didn’t do much to change it aside from allowing customization of which settings a workflow can access and fixing a problem with workflows being unable to make some database updates.
I used that mechanism to create an ‘archiver framework’ – to write some classes to package the data and metadata from a dataset into a single ~RDA-conformant zipped BagIt file with an OAI-ORE metadata map and then send that to an archive. That was originally somewhat monolithic and through the discussion with Gustavo/IQSS I ended up making changes to make it easier to target other repositories and to allow the same archiving code to be called through an API, e.g. to allow an administrator to archive older versions that have already been published. (I also added the ORE map was one of the available metadata export formats…) It’s this stuff that Gustavo was calling Jim’s framework.
The core of that framework is the edu.harvard.iq.dataverse.engine.command.impl.AbstractSubmitToArchiveCommand and some related Dataverse settings that let you specify a specific class to invoke and which repository specific settings it should have access to. The first example was the DuraCloudSubmitToArchiveCommand which is what’s in the documentation – it can be sent a host, port, and context, and gets a username/password from jvm settings.
I’ve just recently created a GoogleCloudSubmitToArchiveCommand that uses the same abstract command class and just directs content to Google Cloud Storage instead (will share this at some point too). This was very easy to create since the abstract class already creates this a workflow and api call and I just had to rewrite one method – calling the same code to create the zipped bag and just replacing the Duracloud API calls with Google’s. It should take minimal programming to send things to Amazon or Microsoft, or, hopefully, to any other archive that can read RDA-conformant Bags.
Conceptually, the archiver mechanism focuses on packaging a dataset for external storage (versus coordinating with a service that’s going to make further changes and potentially interact with Dataverse over time.) For Archivematica, depending on whether you’re thinking of the integration as a one-time transfer of data/metadata or an interaction between Archivematica and Dataverse, you might want to consider designs based on a basic workflow, the abstract archiver class I’ve made, or even on an external tool at the dataset level (open issue #5028). The benefit of the archiver class is that it handles creating both a workflow and api call automatically. Further, the ability to create an ORE map file and/or a zipped Bag would save you from having to write calls to retrieve all the files and metadata (at the cost of having to read the Bag and ORE file). The archiver mechanism would probably not be as useful if you want any sort of interaction over time – for Archivematica to pull files/metadata over time, for Archivematica to push new metadata/provenance back to Dataverse, etc. For those, either the workflow that allows you to call an external service and wait for a callback, or an external tool design where Archivematica could be called and given an apiToken to call back to the Dataverse API as needed, would probably be better. That said, you might still be able to leverage the ORE map file and/or zipped Bag in a basic workflow or external tool design, without using the archiver class itself, to simplify data/metadata transfer.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8F2rE13bk__DP%2BMJqCU_4Lh0_LYVR8uraRKnL8p5En8tw%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/af2a74cf-1567-4746-8ce5-8d9e62ce7094%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.