Hi all,
We've recently set up two separate installs of Archivematica 1.1 for testing: both are running Ubuntu 12.04.4, we installed using the packages, and in both cases Archivematica and the Storage Service are running on the same machine. One is running in VirtualBox on a Mac running Mavericks, the other is on a dedicated server (I can get more details on this configuration if necessary).
In both installations, everything seems to be working fine when we try to ingest smaller SIPs (small in terms of total SIP size, not necessarily number of files). However, we're running into a problem where SIPs above a certain size fail at the "Store the AIP" job with a Server Error 504. Has anyone else run into this?
More, possibly excessive, details below:
The problem: When we try to ingest SIPs that have a total size of around 1.5 GB (uncompressed) or larger, they consistently fail at the "Store the AIP" job during the Store AIP microservice. Archivematica then puts the SIP into the failed directory and stops processing that SIP. According to the Dashboard, at least, no AIP was created.
The strange thing is that what appears to be a valid AIP corresponding to the failed SIP - named transfername-UUID.7z - actually does get put into the AIP store at some point in this process. It does not appear on the list of AIPs in the Archival Storage tab, but it's in /var/archivematica/sharedDirectory/www/AIPsStore/ and it's in the list of packages in the Storage Service. It's also possible to find it by searching the AIPs in the Archival Storage tab, but the search results list shows "(data missing)" where the AIP name would be. The UUID listed in the search result matches the SIP UUID and the package can be downloaded (by clicking on "data missing") just like any other AIP, although the "view raw" output is essentially blank:
{
"_id": "None",
"_index": "aips",
"_type": "aip",
"exists": false
}
What this means is that, for a given failed SIP, a package named transfername-UUID.7z is placed in the AIP store and a failed SIP directory is created with the following structure:
sharedDirectory/failed/transfername-UUID/
├── METS.UUID.xml.validatorTester.html
├── pointer.xml
├── processingMCP.xml
└── transfername-UUID.7z
I've spot-checked a few of these and the md5 checksum on the 7zip package in the failed SIP directory always matches the checksum on the corresponding 7zip package in the AIP Store. I've also extracted a few of the AIP Store packages and run checksums for the files in /data/objects and they match the checksums for the original files along with the checksums in the BagIt manifests (as I would expect since they all pass the Verify AIP job earlier in the micro-service chain).
We've looked through the error messages/logs and here is what we've found (here I'm using the output from our VirtualBox installation):
The output of STDERR for the Store the AIP job is the same each time:
AIP creation failed. See Storage Service logs for more details
There doesn't appear to be anything wrong in /tmp/storage-service.log
The /tmp/archivematica.log shows a 504 error, which suggests it's a problem with nginx:
INFO:root:Creating file with {'size': 1203858511, 'package_type': 'AIP', 'current_path': 'soundtest-a0a73c9c-fa20-4f71-a711-e45af6d95957.7z', 'origin_pipeline': '/api/v2/pipeline/9ae836f6-7e6f-454a-b950-650e8d1208e4/', 'uuid': 'a0a73c9c-fa20-4f71-a711-e45af6d95957', 'current_location': '/api/v2/location/45ea03e3-85c2-4262-b9a4-d9dba13a8d32/', 'origin_path': '/var/archivematica/sharedDirectory/currentlyProcessing/soundtest-a0a73c9c-fa20-4f71-a711-e45af6d95957/soundtest-a0a73c9c-fa20-4f71-a711-e45af6d95957.7z', 'origin_location': '/api/v2/location/04fed302-0938-44a0-b0ca-c35b37b1e700/'}
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.1.19</center>
</body>
</html>
)
The log at /var/log/nginx/error.log shows an error during a POST request that corresponds to the time of the failed AIP:
2014/07/15 15:21:38 [error] 2634#0: *64 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: , request: "POST /api/v2/file/ HTTP/1.1", upstream: "uwsgi://unix:/tmp/storage.uwsgi.sock:", host: "localhost:8000"
Finally, the log at /var/log/uwsgi/app/storage.log shows closed/broken pipe errors at around the same time:
Tue Jul 15 15:22:13 2014 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/v2/file/ (ip 127.0.0.1) !!!
Tue Jul 15 15:22:13 2014 - writev(): Broken pipe [proto/uwsgi.c line 120] during POST /api/v2/file/ (127.0.0.1)
Tue Jul 15 15:22:13 2014 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/v2/file/ (ip 127.0.0.1) !!!
Tue Jul 15 15:22:13 2014 - write(): Broken pipe [proto/uwsgi.c line 138] during POST /api/v2/file/ (127.0.0.1)
This is about as far as we've gotten investigating the problem. I've tried upgrading uwsgi to a newer version on the VM installation but still ran into the same issue. I also tried upgrading nginx as well, but haven't been able to get Archivematica working at all after the upgrade. Any advice would be appreciated.
Thanks,
Andrew