Unable to send large files to S3 resource in cacheless mode

55 views
Skip to first unread message

Carsten Grzemba

unread,
May 16, 2024, 5:08:41 AMMay 16
to iRODS-Chat
I try to copy files >8GB to the S3 resource in cachless mode. Less approx 8GB it works but larger it failed with:

ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 1561416, errno = 104 status = -27104 SYS_COPY_LEN_ERR, Connection reset by peer

The S3 resource has the settings:
S3_PROTO=HTTPS;
S3_WAIT_TIME_SECONDS=10;
S3_ENABLE_MD5=1;
ARCHIVE_NAMING_POLICY=consistent;
S3_REGIONNAME=us-east-1;
S3_MPU_CHUNK=5000;
HOST_MODE=cacheless_attached;
S3_CACHE_DIR=/data-s3cache;
CIRCULAR_BUFFER_SIZE=20;
S3_MPU_THREADS=1

I tested with different paramters for 
S3_MPU_CHUNK=50,500,5000 and 
S3_MPU_THREADS=1,4

I did my tests locally on the server which hosts the S3 resource. I always fails.

The S3 storage is a Cephs Cluster. My files are 20GB .. 200GB large.

How can I fix this? Any help is much appreciated.

Francisco Morales

unread,
May 16, 2024, 5:26:01 AMMay 16
to iRODS-Chat
Hi, 

I've faced similar issues when trying to copy files to a S3 resource. In particular, there seems to be an bug when the number of threads > 1 and verifyChecksum flag is enabled. What has worked for is the following:

irods: 4.2.12

S3 resc context config vars:

S3_RETRY_COUNT=1
S3_WAIT_TIME_SEC=3
S3_PROTO=HTTPS
S3_ENABLE_MD5=1
ARCHIVE_NAMING_POLICY=decoupled
HOST_MODE=cacheless_attached
S3_NON_DATA_TRANSFER_TIMEOUT_SECONDS=3600
S3_ENABLE_COPYOBJECT=0
S3_MPU_CHUNK=50
CIRCULAR_BUFFER_SIZE=6

and the put command:

iput -R <s3_resc> -N 1 <largefile>

Doing so, I've managed to upload files as large as 400GB to S3 (also ceph cluster) in one go.

Best regards, 
Francisco

joris luijsterburg

unread,
May 16, 2024, 6:37:42 AMMay 16
to iRODS-Chat
Checksumming and s3 can indeed be a thing. for us this sometimes fails because the file dissapeared from the cache when it is written to cold storage, this cutting the checksumming short. At our site we now control replication to S3 with a custom rule, and never use checksumming during replication but do it afterwards.
I also had to tinker with S3 settings, but this is now how we run it:
S3_RETRY_COUNT=2;
S3_WAIT_TIME_SECONDS=3;
S3_PROTO=HTTPS;
ARCHIVE_NAMING_POLICY=consistent;
HOST_MODE=cacheless_detached;
S3_CACHE_DIR=/cache/s3;
S3_MPU_CHUNK=500

Also, you can find different threads where tcp_keepalive settings are discussed, which can be a cause of connection drops, especially for larger files. ( https://docs.irods.org/4.3.0/system_overview/troubleshooting/#firewalls-dropping-long-idle-connections-during-parallel-transfer ). You have to find out which component has the lowest timeout and base your keepalive settings on that one.

Best regards,
Joris

Carsten Grzemba

unread,
May 16, 2024, 11:59:58 AMMay 16
to iRODS-Chat
I can confirm the reducing the number of threads of iput or irsync to 4 (-N4) works.

Many Thanks!

James, Justin Kyle

unread,
May 16, 2024, 2:14:28 PMMay 16
to iRODS-Chat
When you alll say copy, are you referring to putting the file, copying within iRODS itself, or replication?  I just want to make sure I am testing the correct thing.

I tried an iput and icp of a 10 GiB file with checksums into an S3 resource and both worked.


From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Thursday, May 16, 2024 5:26 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: [iROD-Chat:21928] Re: Unable to send large files to S3 resource in cacheless mode
 
You don't often get email from nido...@gmail.com. Learn why this is important
--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/510ff340-1360-44d2-bf38-9e0d1eafb079n%40googlegroups.com.

Carsten Grzemba

unread,
May 16, 2024, 4:11:42 PMMay 16
to iRODS-Chat
I tested with iput and irsync. At least with 14GB and larger I see this problem.

James, Justin Kyle

unread,
May 29, 2024, 5:11:19 AMMay 29
to iRODS-Chat
Just to add to this.  As I understand it from discussing this with you, it failed with 16 threads and passed with 4.  I'm assuming that is via the -N option with iput.  Is that correct?

I will test this scenario with larger thread counts to see if I can reproduce it.

If possible, can the errors in the iRODS log be provided?

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Carsten Grzemba <grze...@gmail.com>
Sent: Thursday, May 16, 2024 4:11 PM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21932] Re: Unable to send large files to S3 resource in cacheless mode
 

Carsten Grzemba

unread,
May 30, 2024, 3:16:00 PMMay 30
to irod...@googlegroups.com
Yes, the number of threads ist assigned with -N. Here it has worked also with 8 threads but failed with 16.

irods@culda-j:/opt/danrw/ContentBroker/log$ time /usr/bin/irsync.bin -N 8 -KVR lza i:/culdaj/aip/thulbuser/1-2024051525148/1-2024051525148.pack_1.tar i:/culdaef/federated/culdaj/aip/TEST/1-2024051525148.pack_1.tar
   1-2024051525148.pack_1.ta   15331.950 MB | 1387.011 sec | 8 thr | 11.054 MB/s

real 23m7.587s
user 0m0.025s
sys 0m0.018s
irods@culda-j:/opt/danrw/ContentBroker/log$ time /usr/bin/irsync.bin -N16 -KVR lza i:/culdaj/aip/thulbuser/1-2024051525148/1-2024051525148.pack_1.tar i:/culdaef/federated/culdaj/aip/TEST/1-2024051525148.pack_1.tar
remote addresses: 141.35.106.32 ERROR: rsyncDataToDataUtil - Failed to copy the object "/culdaj/aip/thulbuser/1-2024051525148/1-2024051525148.pack_1.tar" to "/culdaef/federated/culdaj/aip/TEST/1-2024051525148.pack_1.tar" S3_PUT_ERROR
remote addresses: 141.35.106.32 ERROR: rsyncUtil: rsync error for /culdaef/federated/culdaj/aip/TEST/1-2024051525148.pack_1.tar status = -702000 S3_PUT_ERROR

real 9m43.267s
user 0m0.034s
sys 0m0.008s

James, Justin Kyle

unread,
Jun 5, 2024, 11:01:42 AMJun 5
to iRODS-Chat
I have fixed the number of threads issue.  This was a case where the files were small enough that iRODS was not honoring the -N setting.

I have a pull request open for both 4.2.12 and 4.3.2.  These can be found in https://github.com/irods/irods_resource_plugin_s3/pulls.

If you want you can check this out, build it, and let me know if it resolves your problems.

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Thursday, May 16, 2024 5:26 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: [iROD-Chat:21928] Re: Unable to send large files to S3 resource in cacheless mode
 
You don't often get email from nido...@gmail.com. Learn why this is important

James, Justin Kyle

unread,
Jun 5, 2024, 11:05:13 AMJun 5
to iRODS-Chat
I also tried this with a 14 GiB file,  N > 1, and checksum enabled and it worked.  If you can, try this out with the code in the pull requests at https://github.com/irods/irods_resource_plugin_s3/pulls and let me know if you are still having issues.



From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Carsten Grzemba <grze...@gmail.com>
Sent: Thursday, May 16, 2024 4:11 PM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21932] Re: Unable to send large files to S3 resource in cacheless mode
 

Alan King

unread,
Jun 10, 2024, 5:13:39 PMJun 10
to irod...@googlegroups.com
Perhaps to be more specific, try building the changes in this PR (with a 4.3.2 server): https://github.com/irods/irods_resource_plugin_s3/pull/2200 Or this PR for a 4.2.12 server: https://github.com/irods/irods_resource_plugin_s3/pull/2199



--
Alan King
Senior Software Developer | iRODS Consortium
Reply all
Reply to author
Forward
0 new messages