progress update... and some suggetions re. SWORD

9 views
Skip to first unread message

Katherine Fletcher

unread,
Sep 16, 2012, 5:21:47 PM9/16/12
to dataflo...@googlegroups.com, dataflo...@googlegroups.com

Hello!

 

I would like to publicly thank our developers for finding time to keep working towards a release: it has been a rough few weeks for the DataFlow team, with changed priorities for us all.  Thanks for keeping at it, guys.

 

We're getting there... please see this message on the dataflow-devel mailing list for a progress update from Thursday’s hackathon. 

 

In the meantime, the irreplaceable Ben O’Steen has written in with an update (and some tips) on SWORD-related questions.

 

Thanks,

 

Katherine… and now over to Ben:

 

 

I'm not sure what the root of the current SWORD errors are. I know that there are issues with the authentication libraries but this may have nothing to do with the errors you are experiencing. Richard or Anusha (when she is back) may be able to tell you more.

 

My first task for Oxford is to migrate the DataBank codebase to use Django, instead of Pylons. I've hit a dead-end experimenting with handling large uploads with pylons and can't see a real fix. Add in that the current community and momentum around Django is comparably huge, it makes a lot of sense to do this. As part of this migration, I'll be (internally) re-plumbing in the SWORD interface. I've allocated days ahead to this (around 0.6FTE as I have other work on at the moment), and project that in around 5 weeks, I'll be done on both these tasks. It should be ready for testing in 3 to 4 weeks depending.

 

As for the use of the SWORD2 interface, you might find some of the http tests useful, as least as examples, rather than explanations for why certain URIs are used [1] I'll try to keep it short:

 

1 - https://github.com/swordapp/python-client-sword2/blob/master/tests/http/test_sss.py

 

Basics: A SWORD interface exposes a service document, that lists the workspaces the service manages and the collections within those workspaces. Given acceptable authentication/authorisation, a user can create, amend, read and delete containers (think 'labelled bags of content') in these collections. A container is represented by an Atom document - essentially a list of resources, with some top-level metadata (title, attributions, provenance, etc) about the container. You can add, read, delete and update the resources held by this container and typically these resources are called payloads in the python client. These are the files in your dataset for example.

 

So, long story short (w/ python code):

 

- get the service document from the server

 
 
 

from sword2 import Connection, Entry

conn = Connection("http://example.org/service-doc")

 

- Create an Entry (the Atom entry for the container)

 

e = Entry(title="My Dataset Title", id="IMPORTANTID", dcterms_appendix="blah blah", dcterms_title="Dataset Title")

 

(NB parameters here correspond to the Atom namespace - title, author, id, etc. Those prefixed with 'dcterms_' are put into the dcterms XML namespace as part of the Entry document.)

 

Additional fields can be added later, and from additional namespaces (using the same underscore syntax as before):

 

 
 
e.register_namespace("oxds", "http://databank.ox.ac.uk/terms/")
 
 
e.add_field("oxds_whatever", "whatever")
 

 

- Upload the Entry with an attached payload (your dataset for example)

 

with open(PACKAGE) as pkg:

    resp = conn.create(payload = pkg,

 
 
                      metadata_entry = e, 
 
 
                      mimetype = "MIMETYPE/HERE+PLEASE", 
 
 
                      filename = "FILENAME_HERE.PLEASE", 
 
 
                      packaging = 'http://purl.org/net/sword/package/Binary',
 
 
                      workspace='Main Site', 
 
 
                      collection= 'MyCollection'
 
 
                      in_progress=True)
 
 
 
 
assert resp.code == 201 # check that a new resource has been created by this
 

 

For a zipped package, the mimetype would be 'application/zip' and I believe 'http://purl.org/net/sword/package/SimpleZip' although I do not know if this is the correct form for the DataStage->Databank 'BagIt' upload. Anusha or Richard will have to clarify this step.

 

 

 

 

Reply all
Reply to author
Forward
0 new messages