Re: [OpenScienceFramework] OSF todo? enable/foster scriptable transfer to data repositories

17 views
Skip to first unread message

Philip Durbin

unread,
Feb 27, 2014, 1:36:23 PM2/27/14
to openscienc...@googlegroups.com, dataverse...@googlegroups.com
Hi Tom,

Dataverse provides a scriptable "Data Deposit API" based on the
SWORDv2 protocol. Here are some examples with curl:

http://thedata.harvard.edu/guides/dataverse-api-main.html#data-deposit-api

I wrote the API implementation so any bugs are my fault. :)

Our primary use case when developing the API was integration between
Open Journal Systems (OJS) and Dataverse:
http://projects.iq.harvard.edu/ojs-dvn

COS generously hosted me and my boss back in September (hi, everyone!)
and we're working on an integration between OSF and Dataverse that
makes use of the API:
https://github.com/CenterForOpenScience/openscienceframework.org/issues/112

Actually, COS is even helping to develop a Python library to talk to
the Dataverse API (which I really, really appreciate, not being much
of a Pythonista): https://github.com/IQSS/dvn-client-python

But enough about Dataverse. Lots of other repositories support SWORD.
There's an official list at
http://swordapp.org/sword-v2/sword-v2-implementations/ and my
(slightly longer) list at
https://github.com/dvn/dvn-devguide-src/blob/master/features/api/data-deposit.mdwn#sword-v2-server-implementations

But enough about SWORD. Are there other protocols for this? Let me
know! Because some of the stuff we want to do are not covered by the
spec: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html

Hope this helps,

Phil

p.s. Cc'ing the Dataverse community list on this.

On Thu, Feb 27, 2014 at 12:21 PM, Tom Roche <Tom_...@pobox.com> wrote:
>
> https://groups.google.com/d/msg/openscienceframework/2nnzi5nqYTA/f3wohCyIGGEJ
>> [Tom Roche Thu, 27 Feb 2014 10:48:23 -0500]
>> Morpho (at least, the previous version) was fairly time-consuming (manually inputing metadata, not to mention data transfer)
>
> Dunno if this is already in-plan, but one thing I'd like to see OSF tool up (working with providers to enable as necessary) is CLI/scriptable data transfer, esp metadata transfer, to repositories. When attempting to repositorize hundreds (daily for a year, plus spinup) of often-multi-GB netCDF files
>
> 1. interacting with a GUI or web UI is painful and slow.
>
> 2. `tar` seems unattractive, since (I suspect)
>
> * probability of transfer abend grows with {file size, transfer time}, for both uploaders (i.e., me) and downloaders (i.e., collaborators, replicators).
>
> * downloaders will likely want subsets of the data
>
> 3. .tar.gz does not help here, since netCDF are already fairly compact binaries.
>
> Implementation-wise, I'd favor HTTP APIs similar to those already used by BitBucket and GitHub, but only because the clusters on which I work only allow HTTP and SSL out.
>
> Again, this may require work with the repos to provide necessary plumbing on their side. Along those lines (dunno if this is too off-topic), if anyone has pointers to currently-transfer-scriptable repositories, please pass. I have a proposed question about this @ the proposed Open Science Stack Exchange
>
> http://area51.stackexchange.com/proposals/65426/open-science/
>
> FWIW, Tom Roche <Tom_...@pobox.com>
>
> --
> You received this message because you are subscribed to the Google Groups "Open Science Framework" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin
Reply all
Reply to author
Forward
0 new messages