Experiments with S3

231 views
Skip to first unread message

Buddha Buck

unread,
May 10, 2017, 4:01:52 PM5/10/17
to archivematica
In my testing of Archivematica 1.6, I have been attempting to use an S3 bucket as a "Space" to use as a transfer source and as an AIP store. The idea is that we can give our archivists who need to process stuff through Archivematica access to the S3 buckets without having to give them access to the storage space on the Archivematica server itself. None of the options for remote storage seemed to fit our needs.

I have had some success, which I'd like to share with the community. Any feedback, questions, or suggestions for improvement would be appreciated.

My setup is an AWS instance running CentOS and newly installed and configured Archivematica 1.6/Archivematica Storage Service 0.10 services. 

I have about 150GB of local disks split across two file systems ( / and /data). I already have two spaces configured for testing, one pointing to /data/secure (for testing/simulating eventual workflows involving archived data containing personally identifying information) and one pointing to /data/common (for testing non-sensitive data). Both these spaces are configured to use /data/staging for staging. Both these spaces have ./transfer_source, ./transfer_backlog, ./aip_store, ./dip_store directories and locations configured pointing to them.

Since Archivematica does not support S3 directly, my goal was to find a way to mount an S3 bucket as a filesystem, then create a space in Archivematica that referred to that filesystem, and try to use it as a transfer source and AIP store to test the round-trip aspects of it.

With that the basic plan, here's what I did:

Mount S3 Bucket as a filesystem:

The tool I found for this is a package called s3fs-fuse (https://github.com/s3fs-fuse/s3fs-fuse), which seemed like the best of limited choices. Since my expected usages involves only writing a file once and never changing it, the limitations of the tool seem acceptable to me.

I followed the instructions on the linked page for downloading and installing it. The instructions are easy to follow and have no surprises.

I created a ~/.passwd-s3fs file with my S3 access key/secret access key pair. This is the default location that s3fs uses to find the s3 credentials.

I created an ~/s3mountpoint/ directory to use as a mountpoint.

I issued the command
s3fs -o allow_other,uid=1001,gid=1001,umask=220 s3fs-fuse-testing s3mountpoint
where 1001 is the user and group IDs for the archivematica user and group, s3fs-fuse-testing is the name of my s3 bucket, and s3mountpoint is the mountpoint I wanted to add.

I found out the hard way that the uid=1001,gid=1001 options were important. Without them, s3fs was unable to allow me to read any files I added to the bucket via an external service (like using "aws s3 cp localfile s3://s3fs-fuse-testing/" on a different machine). 
The allow_other option didn't seem to do much, but supposedly it allows a user other than the one who mounted the filesystem to access it. The umask option sets the file permissions of the files on the s3fs filesystem. Currently, the 220 mask gives read and execute permission to all files and directories to the owner and group of the files, and full access to other users. I should probably change that to something that makes a bit more sense.

With those settings, the Archivematica Storage Service was able to see the mountpoint and treat it as a space. I added transfer_source, transfer_backlog, aip_store, and dip_store directories and configured locations pointing to them.

For a final test, I used the "aws s3" tool to transfer a directory of files into the s3://s3fs-fuse-testing/transfer_source directory and attempted to transfer and ingest them in Archivematica. That was successful.

Before I call this test completely done, I will need to configure the s3 bucket to be mounted via fstab, so it'll be mounted at startup. Right now, however, it looks to be a good test.

Has anyone else done something like this? Does anyone have suggestions I might have missed?

Reply all
Reply to author
Forward
0 new messages