Problem with files larger than ~600 GB

227 views
Skip to first unread message

olivier...@gmail.com

unread,
Dec 8, 2020, 11:21:28 AM12/8/20
to archivematica
Hi everybody,

We've now used Archivematica to store our video files for months. Everything works fine, most of the time. We've stored hundreds of files successfully.

I now have a problem with 6 particularly large files (uncompressed video files). They all have a size >= 600 GB approximately. They're stored successfully on our S3 storage and replicated successfully on another S3 bucket as well. Both packages are in "Uploaded" status, but the "Store the AIP" task still fails.

This is strange, because:

- this doesn't happen with files >= 500 GB, but < 600 GB
- I don't see anything in the logs (SS, dashboard, client, server)
- it seems to happen consistently (i.e. if I retry with the same problematic file over and over)

Does anybody have any hint for me? What could I try?

Thanks in advance,
Olivier

Ross Spencer

unread,
Dec 9, 2020, 1:23:04 PM12/9/20
to archivematica
Hi Olivier, 

Just a quick question - does fixity/a bag check pass for the successfully stored objects? 

Ross

olivier...@gmail.com

unread,
Dec 14, 2020, 10:13:54 AM12/14/20
to archivematica
Hi Ross,

I've checked the MOV file itself (583 GiB) by downloading it from the S3 storage and computing its hash. It's correct. So the S3 object is stored correctly. The rest of the bag looks fine.

Olivier

Ross Spencer

unread,
Dec 15, 2020, 2:19:31 PM12/15/20
to archivematica
That's positive thank you Olivier. 

If you are doing this testing in a test environment or have an ability to try this in a test environment I have a draft pull-request open for additional S3 logging. Are you able to deploy and try new code? And potentially make the same change for yourself? https://github.com/artefactual/archivematica-storage-service/pull/550 - I have targeted storage service 0.16.x - you might need to cherry-pick into another branch. This is not normally what we'd suggest others do - it's very much with developer goggles on and it is asking that it's tried in a development environment.

We are aware that we might not be getting enough logging out of S3 for other clients. We have also seen an problem on a smaller sized file too. That being another third-party S3 implementation. 

On the one hand 599 GiB is a positive size to be achieving - on the other I can see the frustration with this error-state happening beyond that.

Ross

olivier...@gmail.com

unread,
Dec 21, 2020, 8:58:44 AM12/21/20
to archivematica
Thanks, Ross!

I've added the extra line. I'll run some tests and check if the logs contain anything interesting.

Olivier

olivier...@gmail.com

unread,
Dec 23, 2020, 4:54:43 AM12/23/20
to archivematica
Still nothing, I'm afraid. It looks like the problem appears after the files have been successfully uploaded to the S3 buckets.

Main bucket:

DEBUG     2020-12-22 04:52:53  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running
INFO      2020-12-22 04:53:33  locations.models.package:package:replicate:582:  Replicating package 287f6c30-1fdc-4d4a-8ab0-5e4b290565f9 (type is file: False), to replicator location 486a81dc-6209-420c-b560-428ac150d36c
DEBUG     2020-12-22 04:53:33  common.utils:utils:uuid_to_path:579:  path 8fb8/1f43/169d/4999/ad12/b96d/0699/3297
DEBUG     2020-12-22 04:53:33  locations.models.space:space:move_to_storage_service:344:  TO: src: aip-store/287f/6c30/1fdc/4d4a/8ab0/5e4b/2905/65f9/285-287f6c30-1fdc-4d4a-8ab0-5e4b290565f9/
DEBUG     2020-12-22 04:53:33  locations.models.space:space:move_to_storage_service:345:  TO: dst: 8fb8/1f43/169d/4999/ad12/b96d/0699/3297/285-8fb81f43-169d-4999-ad12-b96d06993297
DEBUG     2020-12-22 04:53:33  locations.models.space:space:move_to_storage_service:346:  TO: staging: /var/archivematica/storage_service
DEBUG     2020-12-22 04:53:33  locations.models.s3:s3:_ensure_bucket_exists:116:  Test the S3 bucket 'nicollier' exists
DEBUG     2020-12-22 04:53:34  boto3.resources.factory:factory:load_from_definition:66:  Loading s3:s3
DEBUG     2020-12-22 04:53:34  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running
DEBUG     2020-12-22 04:53:34  locations.models.s3:s3:_ensure_bucket_exists:121:  S3 bucket's response: {'LocationConstraint': None, 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '107', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'server': 'HGST-HGST Active Archive System/42.4.0-97a930e7c3601ab1b966578481102c82967c0202', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'date': 'Tue, 22 Dec 2020 12:53:34 GMT', 'content-type': 'application/xml'}}}
DEBUG     2020-12-22 04:53:34  boto3.resources.factory:factory:load_from_definition:66:  Loading s3:Bucket
DEBUG     2020-12-22 04:53:34  boto3.resources.model:model:_load_name_with_category:358:  Renaming Bucket attribute name
DEBUG     2020-12-22 04:53:34  boto3.resources.factory:factory:load_from_definition:66:  Loading s3:Bucket
DEBUG     2020-12-22 04:53:34  boto3.resources.model:model:_load_name_with_category:358:  Renaming Bucket attribute name
DEBUG     2020-12-22 04:53:34  boto3.resources.collection:collection:pages:152:  Calling paginated s3:list_objects with {'Prefix': u'aip-store/287f/6c30/1fdc/4d4a/8ab0/5e4b/2905/65f9/285-287f6c30-1fdc-4d4a-8ab0-5e4b290565f9/', u'Bucket': u'nicollier'}
DEBUG     2020-12-22 04:53:34  boto3.resources.factory:factory:load_from_definition:66:  Loading s3:ObjectSummary
DEBUG     2020-12-22 04:53:34  boto3.resources.model:model:_load_name_with_category:358:  Renaming ObjectSummary attribute key
DEBUG     2020-12-22 04:53:39  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running

Mirror bucket:

DEBUG     2020-12-22 07:46:00  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running
INFO      2020-12-22 07:46:01  locations.models.package:package:should_have_pointer_file:752:  Package should not have a pointer file because aip-store/287f/6c30/1fdc/4d4a/8ab0/5e4b/2905/65f9/285-287f6c30-1fdc-4d4a-8ab0-5e4b290565f9 is not a file
DEBUG     2020-12-22 07:46:01  locations.models.space:space:move_from_storage_service:421:  FROM: src: 8fb8/1f43/169d/4999/ad12/b96d/0699/3297/285-8fb81f43-169d-4999-ad12-b96d06993297
DEBUG     2020-12-22 07:46:01  locations.models.space:space:move_from_storage_service:422:  FROM: dst: aip-store/8fb8/1f43/169d/4999/ad12/b96d/0699/3297/285-8fb81f43-169d-4999-ad12-b96d06993297/
DEBUG     2020-12-22 07:46:01  locations.models.s3:s3:_ensure_bucket_exists:116:  Test the S3 bucket 'nicollier-secondary' exists
DEBUG     2020-12-22 07:46:01  boto3.resources.factory:factory:load_from_definition:66:  Loading s3:s3
DEBUG     2020-12-22 07:46:01  locations.models.s3:s3:_ensure_bucket_exists:121:  S3 bucket's response: {'LocationConstraint': None, 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '107', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'server': 'HGST-HGST Active Archive System/42.4.0-97a930e7c3601ab1b966578481102c82967c0202', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'date': 'Tue, 22 Dec 2020 15:46:01 GMT', 'content-type': 'application/xml'}}}
DEBUG     2020-12-22 07:46:01  boto3.resources.factory:factory:load_from_definition:66:  Loading s3:Bucket
DEBUG     2020-12-22 07:46:01  boto3.resources.model:model:_load_name_with_category:358:  Renaming Bucket attribute name
DEBUG     2020-12-22 08:50:29  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running

Another bit of log here:

DEBUG     2020-12-22 11:00:17  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running
INFO      2020-12-22 11:01:04  locations.models.package:package:replicate:716:  Finished replicating package 287f6c30-1fdc-4d4a-8ab0-5e4b290565f9 as replica package 8fb81f43-169d-4999-ad12-b96d06993297
DEBUG     2020-12-22 11:01:04  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running
DEBUG     2020-12-22 11:01:07  locations.api.resources:resources:hydrate_current_location:945:  `current_location` was not matched by `default_location_regex`
DEBUG     2020-12-22 11:01:09  locations.models.async_manager:async_manager:_watchdog_loop:123:  Watchdog sees 0 tasks running

I don't see any error or anything suspicious after that.

Olivier
Screenshot 2020-12-23 at 10.48.33.png
Screenshot 2020-12-23 at 10.43.46.png
Screenshot 2020-12-23 at 10.43.23.png

Ross Spencer

unread,
Dec 23, 2020, 11:10:22 AM12/23/20
to archivematica
Ah, thanks Olivier. That PR is for a related issue which I am thinking about for another client. If I get more information or manage to get more out of S3/Boto/AM I'll be sure to pop back in. They are seeing this far below 600GB right now. 

My work there has yet to be fruitful - there are two parts to that - our simple use-case hasn't required we control the API a lot and so we only use a bare minimum of its expressiveness to achieve quite a lot. I am trying to understand how to change some of the defaults better. Additionally, I am not easily able to raise exceptions out of the S3 adapter in the storage service and want to understand our exception handling there and whether or not we're masking something there for an issue like this that we'd otherwise want to be able to report on. I'll likely be back onto this for the client after the holidays. 

A quick question - Is this an Amazon S3 or another vendor? 

Olivier Bruchez

unread,
Dec 23, 2020, 11:26:59 AM12/23/20
to archiv...@googlegroups.com
Thanks, Ross.

We use an HGST (now Quantum) Active Archive for S3 storage.

Olivier

--
You received this message because you are subscribed to a topic in the Google Groups "archivematica" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/archivematica/W4VFBmfNWQE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/e4a02843-6bde-4dde-9a2b-1a51c7180410n%40googlegroups.com.


Ross Spencer

unread,
Jan 20, 2021, 5:13:16 PM1/20/21
to archivematica
Hi Olivier,

I have set up a standalone script that you might want to try against your S3 instance with some test data over 600GB to see if it helps with your work. The script does a few things:

* Wraps the Storage Service work to create an executable. 
* Provides standalone logging + boto3/botocore logging which seems to have triggered the type of logging I had hoped for previously. (There's a lot! so I recommend in the README piping it to a file).
* Provides a better way to configure the boto3.Resource() to supply a config object with custom connection and read timeout. 


Have a look at the README.md, download the requirements.txt and s3.py. And give it a whirl. In the virtual environment I am hoping that the increase in timeout I suggest from 60 (default) to 900 (another value I found in relation to another S3 service) will do the trick for your 600GB object. But you can play with that. 

If you have access to the HGST docs (I looked for some online but couldn't find what I needed) they might describe some timeout defaults. If they do, it'd be great if you can share them just so we can get an idea about variability between services. 

I'm going to write-up an issue for this and I should be able to describe in that a little more about what I think was happening. If you do try recreating the error with the defaults again, then have a look at the last 1500 or so lines of log and it should describe the story for you. What I anticipate you'll see is a Timeout, then an Abort signal sent, followed by an issue with S3 trying to delete the content and a NoSuchUpload error. But it might be a different error, which would be great to learn about if you can share that. 

I'm gong to work on the premise that changing the timeout is going to be the basis for a configurable fix/update to the storage service so keep an eye on that.

Good luck, and no worries if you don't get around to testing this. I appreciate trying to test this might be a luxury if you only have a production S3 instance to work with.
Best,
Ross

olivier...@gmail.com

unread,
Jan 21, 2021, 5:01:05 AM1/21/21
to archivematica
Thanks, Ross!

I will try this in the next few days.

Olivier

Ross Spencer

unread,
Feb 24, 2021, 8:50:28 AM2/24/21
to archivematica
Hi Again Olivier, 

The test scripts above were developed in collaboration with our client and we were pretty happy with the results on their test server. The changes have been merged into our development branch and once they're put through acceptance and regression testing should be available with the release of Archivematica 1.13 - sharing the commit here in case it is of help to you in your work: https://github.com/artefactual/archivematica-storage-service/commit/8928cf0dd744ff0dcf2c0dcff49055a0a8226f21 - you could potentially back-port it to storage service 0.16.x but it takes a bit of wrangling because of some newer components in the config modules. Still, it might serve a useful reference to you. 

All the best,
Ross

olivier...@gmail.com

unread,
Mar 3, 2021, 8:21:58 AM3/3/21
to archivematica
Hi Ross,

Thanks a lot for the information! And sorry for the delay.

I'm currently pretty busy with another project (not using Archivematica). Do you have any idea when 1.13 will be released? That might help me plan how I will prioritize my tasks/projects.

Best,
Olivier

Sarah Romkey

unread,
Mar 9, 2021, 10:34:57 AM3/9/21
to archiv...@googlegroups.com
Hi Olivier,

We don't have an official release target date yet for Archivematica 1.13 but my guess at this point would be sometime in June or July.

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Archivematica Program Manager
@archivematica / @accesstomemory




You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/fff8a495-0ff9-41dc-ab44-ecce5dc4e8a2n%40googlegroups.com.

olivier...@gmail.com

unread,
Mar 9, 2021, 10:48:13 AM3/9/21
to archivematica
Thanks, Sarah and Ross!

Olivier

olivier...@gmail.com

unread,
Aug 30, 2021, 4:41:29 AM8/30/21
to archivematica
Hi,

So we've finally upgraded from 1.11.2 to 1.13.0.

Unfortunately, we still have this bug with big (i.e. >= 500 GB) files. Same symptoms (packages in "Uploaded" state, SIP in failed state, etc.).

I was really hoping the upgrade would fix the problem.

Any idea what I could try next?

Thanks,
Olivier
Reply all
Reply to author
Forward
0 new messages