Re: MyTardis with Swift (OpenStack Object Store)?

49 views
Skip to first unread message

James Wettenhall

unread,
Aug 21, 2014, 9:06:46 PM8/21/14
to tardis...@googlegroups.com
Hi,

I've been playing with this, and I've now succeeded in uploading a real dataset to a Swift-backed MyTardis instance using Grischa's latest storage code.  I thought I would list a few issues I have come across along the way.

1. I wasn't able to authenticate to NeCTAR Swift storage using this code: https://github.com/wecreatepixels/django-storage-swift/, until I added "SWIFT_TENANT_ID".  (The original code only had SWIFT_TENANT_NAME).  I should fork this on GitHub to make the change clearer, but it was just a simple change - see the "tenant_id" key I added to the "os_options" dictionary below in django-storage-swift's swift/storage.py:

tenant_id = setting('SWIFT_TENANT_ID')       # NEW
...
        # Get authentication token
        self.storage_url, self.token = swiftclient.get_auth(
            self.api_auth_url,
            self.api_username,
            self.api_key,
            auth_version=self.auth_version,
            os_options={"tenant_id": self.tenant_id,      #  NEW
                        "tenant_name": self.tenant_name},
            insecure=True
        )

2. api.py was checking for the "tardis_portal.add_datafile" permission, but my users (added via the Django admin interface) had the "tardis_portal.add_dataset_file" permission instead.
 
3. I found that even with DEFAULT_FILE_STORAGE = 'swift.storage.SwiftStorage' in settings.py, new datasets didn't automatically get assigned a default storage box.  When creating a dataset through the web interface, I had to click on my preferred storage box ("swift"), which I manually created in the Storage Boxes section of the Django Admin interface.  And when creating a dataset through the TastyPie API, the dataset ended up with a useless DummyStorageBox until I modified the API a bit:

 class DatasetResource(MyTardisModelResource):
     experiments = fields.ToManyField(
         ExperimentResource, 'experiments', related_name='datasets')
+    storage_boxes = fields.ToManyField(
+        LocationResource, 'storage_boxes', related_name='datasets')

and specified a storage box in my JSON data when creating the dataset:

        dataset_dict = {
                          u'description': description,
                          u'experiments': experiments_list,
                          u'immutable': immutable,
                          u'parameter_sets': parameter_sets_list,
                          u'storage_boxes': ['/api/v1/location/3/']
                        }

4. Now I have some data uploaded into MyTardis with a Swift (single container) backend.  But if I try to download an individual datafile, I get "Sorry, not implemented yet. Please append "?format=json" to your URL.".  That should be an easy fix, but I haven't yet figured out where to apply the fix.  And if I try to download multiple datafiles, I get an Internal Server Error with the following traceback:

Traceback (most recent call last):

  File "/opt/mytardis/develop/eggs/Django-1.5.5-py2.7.egg/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/opt/mytardis/current/tardis/tardis_portal/download.py", line 503, in streaming_download_datafiles
    comptype, organization)

  File "/opt/mytardis/current/tardis/tardis_portal/download.py", line 370, in _streaming_downloader
    do_gzip=comptype != 'tar')

  File "/opt/mytardis/current/tardis/tardis_portal/download.py", line 219, in __init__
    self.tar_size = self.compute_size()

  File "/opt/mytardis/current/tardis/tardis_portal/download.py", line 226, in compute_size
    size = os.fstat(the_file.fileno()).st_size

  File "/opt/mytardis/develop/eggs/Django-1.5.5-py2.7.egg/django/core/files/utils.py", line 12, in <lambda>
    fileno = property(lambda self: self.file.fileno)

AttributeError: StringIO instance has no attribute 'fileno'

5. I started with Steve Androulakis's https://github.com/steveandroulakis/mytardis-uploader script, and I had to make various changes to get it working for our requirements.  I'll continue refining my fork of this script, and then share it on GitHub.

Cheers,
James



On 7 July 2014 16:51, Andy Tseng <andy....@unimelb.edu.au> wrote:
Thanks Grischa for your quick reply. 

I'll have a look at the link you provided and try to set up a test server to see how it works.

Ideally, I'd like to user multiple Swift containers with MyTardis to facilitate different projects. 

I'll keep you posted on the progress.

Thanks,

Andy
--
Dr Andy Tseng | Data Infrastructure Architect
Research Services | Information Technology Services
Level 3, Room 326, Doug McDonell Building (168), University of Melbourne, 3010, VIC, Australia
Telephone +61 3 9035 3313 | Mobile +61 410 115 047 | Email andy....@unimelb.edu.au


On 7 July 2014 at 15:08:53, Grischa (gri...@gmail.com) wrote:
> Hi Andy
>
> This is sort of done using my newest branch
> https://github.com/grischa/mytardis/tree/enhanced_storage_backend which has
> a new storage backend.
> To use a single Swift container for your whole installation, you can just
> use this: https://github.com/wecreatepixels/django-storage-swift/
> For different containers you'd need to do some further development. Happy
> to guide you if you want to do that.
>
> Beware this is pretty new storage code and may have issues, but I'll fix
> them immediately if you find any.
>
> Let us know how you go. We are also interested in having this working, so
> happy to help out.
>
> Cheers
>
>
> On Monday, July 7, 2014 1:49:53 PM UTC+10, Andy Tseng wrote:
> >
> > Hi everyone,
> >
> > I'm wondering what is the current status regarding the use of OpenStack's
> > Swift with MyTardis?
> > I found a post
> >
> > (dated in 2012) where Steve A. had suggested that object storage support is
> > being considered for development and I would like to find out if it is
> > possible to do so now?
> >
> > Cheers,
> >
> > Andy
> > --
> > Dr Andy Tseng | Data Infrastructure Architect
> > Research Services | Information Technology Services
> > Level 3, Room 326, Doug McDonell Building (168), University of Melbourne,
> > 3010, VIC, Australia
> > Telephone +61 3 9035 3313 | Mobile +61 410 115 047 | Email
> > andy....@unimelb.edu.au
> >
> >
> >
> >
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "tardis-devel"
> group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/tardis-devel/J20QjQoKgH0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to tardis-devel...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "tardis-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tardis-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Grischa Meyer

unread,
Aug 21, 2014, 9:11:06 PM8/21/14
to tardis...@googlegroups.com
Thanks James,

Your testing is much appreciated!
I'll incorporate fixes for the issues soon. Send pull requests, if you have fixed some already.

Cheers

James Wettenhall

unread,
Aug 21, 2014, 9:12:29 PM8/21/14
to tardis...@googlegroups.com
Hi again,

On 22 August 2014 11:06, James Wettenhall <james.we...@monash.edu> wrote:
4. Now I have some data uploaded into MyTardis with a Swift (single container) backend.  But if I try to download an individual datafile, I get "Sorry, not implemented yet. Please append "?format=json" to your URL.".  That should be an easy fix, but I haven't yet figured out where to apply the fix.  And if I try to download multiple datafiles, I get an Internal Server Error with the following traceback:

Actually this error: "Sorry, not implemented yet. Please append "?format=json" to your URL." when downloading a single datafile may have been a red herring.  I had a syntax error in one of my logging statements in api.py.  After fixing my syntax error, I'm now able to download a single datafile from my Swift-backed MyTardis instance.  But the issue with downloading multiple datafiles from Swift is still in play.

Cheers,
James

Carlo Hamalainen

unread,
Aug 22, 2014, 1:00:15 AM8/22/14
to tardis...@googlegroups.com
Hi everyone,

Just a small question: will the Swift storage back-end be usable via the REST API?

At the moment I'm using the REST API for my DICOM uploader utility. I do the thing where I get a location and then copy/move the file there locally.

Cheers,

-- Carlo

Grischa Meyer

unread,
Aug 22, 2014, 1:13:38 AM8/22/14
to tardis...@googlegroups.com
Hi Carlo

Yeah, of course it will be supported in the REST interface. The REST interface has turned out to be one of the most important aspects of MyTardis.
I still have to investigate the issues James brought up about datasets and defaults, but the goal was to leave API v1 mostly/completely unchanged and use the location name to refer to StorageBoxes.
API v2 will then have a more sophisticated interface to the storage backends.

Cheers

Carlo Hamalainen

unread,
Aug 22, 2014, 1:31:50 AM8/22/14
to tardis...@googlegroups.com
On 22/08/14 07:13, Grischa Meyer wrote:
> Yeah, of course it will be supported in the REST interface. The REST
> interface has turned out to be one of the most important aspects of
> MyTardis.
> I still have to investigate the issues James brought up about datasets
> and defaults, but the goal was to leave API v1 mostly/completely
> unchanged and use the location name to refer to StorageBoxes.
> API v2 will then have a more sophisticated interface to the storage
> backends.

Great.

At UQ we are going ahead with a large NFS mount but other NIF sites may
prefer object storage so this makes us happy.

============================================
Dr Carlo Hamalainen
Senior Software Engineer
Centre for Advanced Imaging
University of Queensland
St Lucia, QLD, 4072. AUSTRALIA

E: c.hama...@uq.edu.au
============================================

James Wettenhall

unread,
Aug 31, 2014, 7:48:24 PM8/31/14
to tardis...@googlegroups.com
Hi,

I thought I'd update my progress with using Swift as a back-end for MyTardis.

While using MyTardis's develop branch, I came across a few minor issues for which I have now submitted pull requests.  Hopefully they are self-explanatory.

There are some less minor issues I've encountered, like the MyTardis server running out of memory while uploading a servies of 450 MB datafiles by HTTP POSTing to the API, but that requires further investigation.  Also I've noticed some temporary datafile upload files not being cleaned up promptly, e.g. "tmpPcMZsR.upload".

I've documented the minor change I made to the django-storage-swift Python module here: https://github.com/wettenhj/django-storage-swift/commit/e3393714167ae2b5c4388d8f42a48a6768f94b25

I had a problem with a mixture of "add_dataset_file" permissions and "add_datafile" permissions, but now I can't reproduce this.  I guess it could have been a problem with the South migration from my old MyTardis database, I can't find any problem in my latest testing. 
 
I'm still using some tweaks I made to the MyTardis API to allow my to specify a storage box when creating a dataset, but I haven't submitted a pull request, because I think Grischa may have a different solution in mind.
 
And if I try to download multiple datafiles, I get an Internal Server Error with the following traceback: 
... 
AttributeError: StringIO instance has no attribute 'fileno'

I haven't looked further at this issue, but I can download individual datafiles.
 
I'm still tidying up my upload script, based on Steve A's mytardis-uploader.  I'll be happy to share it on GitHub if anyone's interested.

Cheers,
James

Grischa Meyer

unread,
Sep 1, 2014, 2:35:40 AM9/1/14
to tardis...@googlegroups.com
I've been working on fixes and I incorporated James' fixes in the develop branch already.

Regarding permissions, Django has an update_permissions command. Run 'bin/django update_permissions' when you encounter issues with stale permission entries.




James Wettenhall

unread,
Sep 1, 2014, 9:43:42 PM9/1/14
to tardis...@googlegroups.com
Hi,

Regarding the issue I raised with temporary uploads not getting cleaned up, this line:

bundle.data['file_object'].close()

in Grischa's recent commit:


definitely seems to help with that. :-)  I think it's completely fixed, but I still have a few lingering temporary upload files, presumably because I'm still getting some memory errors, so it's possible in that case to have a temporary upload not cleaned up.

Cheers,
James

James Wettenhall

unread,
Oct 17, 2014, 2:21:09 AM10/17/14
to tardis...@googlegroups.com
Hi,

For our Swift-backed MyTardis, I finally got around to switching from primarily uploading via HTTP POST (using MyTardis's TastyPie API) to uploading using Python's SwiftClient, and then registering the datafiles in MyTardis using the TastyPie API.  As expected, this has significantly reduced the strain on the MyTardis server and eliminated the Django out of memory errors I was getting when POSTing a series of large files to MyTardis.  The SwiftClient upload method is not perfect though - I can't seem to upload filenames including spaces when using SwiftClient's command-line interface, although I probably could get it to work if I imported the SwiftClient Python module and worked directly with its Python objects, instead of using the CLI.  For now, I'm falling back to the HTTP POST method for filenames with spaces.

So the memory errors I encountered can certainly be avoided by using the "shared permanent location" ingestion method, described at https://mytardis.readthedocs.org/en/latest/api.html

I'm running SwiftClient on a machine closely connected to the data-collection instrument, i.e. not "my" machine, and I don't really trust SwiftClient to leave enough available I/O bandwidth to keep my clients happy during business hours, so I schedule the uploads to run overnight and over the weekend.  Datasets are typically around 200 GB.

Cheers,
James

Reply all
Reply to author
Forward
0 new messages