Hello!
I would like to publicly thank our developers for finding time to keep working towards a release: it has been a rough few weeks for the DataFlow team, with changed priorities for us all. Thanks for keeping at it, guys.
We're getting there... please see this message on the dataflow-devel mailing list for a progress update from Thursday’s hackathon.
In the meantime, the irreplaceable Ben O’Steen has written in with an update (and some tips) on SWORD-related questions.
Thanks,
Katherine… and now over to Ben:
I'm not sure what the root of the current SWORD errors are. I know that there are issues with the authentication libraries but this may have nothing to do with the errors you are experiencing. Richard or Anusha (when she is back) may be able to tell you more.
My first task for Oxford is to migrate the DataBank codebase to use Django, instead of Pylons. I've hit a dead-end experimenting with handling large uploads with pylons and can't see a real fix. Add in that the current community and momentum around Django is comparably huge, it makes a lot of sense to do this. As part of this migration, I'll be (internally) re-plumbing in the SWORD interface. I've allocated days ahead to this (around 0.6FTE as I have other work on at the moment), and project that in around 5 weeks, I'll be done on both these tasks. It should be ready for testing in 3 to 4 weeks depending.
As for the use of the SWORD2 interface, you might find some of the http tests useful, as least as examples, rather than explanations for why certain URIs are used [1] I'll try to keep it short:
1 - https://github.com/swordapp/python-client-sword2/blob/master/tests/http/test_sss.py
Basics: A SWORD interface exposes a service document, that lists the workspaces the service manages and the collections within those workspaces. Given acceptable authentication/authorisation, a user can create, amend, read and delete containers (think 'labelled bags of content') in these collections. A container is represented by an Atom document - essentially a list of resources, with some top-level metadata (title, attributions, provenance, etc) about the container. You can add, read, delete and update the resources held by this container and typically these resources are called payloads in the python client. These are the files in your dataset for example.
So, long story short (w/ python code):
- get the service document from the server
from sword2 import Connection, Entry
conn = Connection("http://example.org/service-doc")
- Create an Entry (the Atom entry for the container)
e = Entry(title="My Dataset Title", id="IMPORTANTID", dcterms_appendix="blah blah", dcterms_title="Dataset Title")
(NB parameters here correspond to the Atom namespace - title, author, id, etc. Those prefixed with 'dcterms_' are put into the dcterms XML namespace as part of the Entry document.)
Additional fields can be added later, and from additional namespaces (using the same underscore syntax as before):
e.register_namespace("oxds", "http://databank.ox.ac.uk/terms/")
e.add_field("oxds_whatever", "whatever")
- Upload the Entry with an attached payload (your dataset for example)
with open(PACKAGE) as pkg:
resp = conn.create(payload = pkg,
metadata_entry = e,
mimetype = "MIMETYPE/HERE+PLEASE",
filename = "FILENAME_HERE.PLEASE",
packaging = 'http://purl.org/net/sword/package/Binary',
workspace='Main Site',
collection= 'MyCollection'
in_progress=True)
assert resp.code == 201 # check that a new resource has been created by this
For a zipped package, the mimetype would be 'application/zip' and I believe 'http://purl.org/net/sword/package/SimpleZip' although I do not know if this is the correct form for the DataStage->Databank 'BagIt' upload. Anusha or Richard will have to clarify this step.
Hi,
To all those on the list, the included message on the previous email was from me to a private group of developers - the section on SWORD2 was off the cuff and not checked by others who are actively working on that side of the development. As such, it really should be taken with a *large* pinch of salt until Richard Jones or Anusha reviews it to see if it tallies with reality.
Also, the message included a number of out of context replies by me as well as the formatting being completely broken. I have a draft post ready and it will be blogged once I get it checked and this post would be the preferred version.
Ben
PS I have not yet begun the previously mentioned migration from pylons to django and do not like to publicly talk about code before I've started writing it (which I am due to tomorrow) I'll still post to the list as I was going to once I'm underway and have a better handle on how and when (testable) features might appear.
tl;dr I do not like to "announce vapourware"
--