Several errors when uploading a large file

37 views
Skip to first unread message

colin r

unread,
Jan 28, 2025, 9:11:39 AM1/28/25
to iRODS-Chat
Hi,

i'm facing an issue when uploading a big file (here 134GB) on iRODS ressource with a replication.
The upload to the primary server seems to be OK (100% done) but then,  an error occurs and the file is not replicated, and is marked as an intermediate state in the catalog.

I'm using the iput (version 4.3.3 client and server) with the -PVTf options.

Here is the error msg of the iput command

```
0/1 -  0.00% of files done   0.000/133900.875 MB -  0.00% of file sizes done

Processing 14368.tar - 133900.875 MB   2025-01-22.09:20:46

From server: NumThreads=4, addr:*****************, port:24025, cookie=1473576768

14368.tar - 40.000/133900.875 MB -  0.03% done   2025-01-22.09:20:48

14368.tar - 240.000/133900.875 MB -  0.18% done   2025-01-22.09:20:50

14368.tar - 360.000/133900.875 MB -  0.27% done   2025-01-22.09:20:52

[...]

14368.tar - 126760.000/133900.875 MB - 94.67% done   2025-01-22.09:46:32

14368.tar - 126835.219/133900.875 MB - 94.72% done   2025-01-22.09:46:34

14368.tar - 133900.875/133900.875 MB - 100.00% done   2025-01-22.09:49:56

 ERROR: [-] /irods_source/lib/core/src/procApiRequest.cpp:162:int sendApiRequest(rcComm_t *, int, const void *, const bytesBuf_t *) :  status [SYS_HEADER_WRITE_LEN_ERR]  errno [] -- message [failed to call 'write body']

[-] /irods_source/lib/core/src/sockComm.cpp:1337:irods::error sendRodsMsg(irods::network_object_ptr, const char *, const bytesBuf_t *, const bytesBuf_t *, const bytesBuf_t *, int, irodsProt_t) :  status [SYS_HEADER_WRITE_LEN_ERR]  errno [] -- message [failed to call 'write body']

[-] /irods_source/plugins/network/src/ssl.cpp:945:irods::error ssl_send_rods_msg(irods::plugin_context &, const char *, const bytesBuf_t *, const bytesBuf_t *, const bytesBuf_t *, int, irodsProt_t) :  status [SYS_HEADER_WRITE_LEN_ERR]  errno [] -- message [Write message header failed.]

[-] /irods_source/lib/core/src/sockComm.cpp:587:irods::error writeMsgHeader(irods::network_object_ptr, const msgHeader_t *) :  status [SYS_HEADER_WRITE_LEN_ERR]  errno [] -- message [Wrote -1 expected -1962934272.]

[-] /irods_source/plugins/network/src/ssl.cpp:889:irods::error ssl_write_msg_header(irods::plugin_context &, const bytesBuf_t *) :  status [SYS_HEADER_WRITE_LEN_ERR]  errno [] -- message [Wrote -1 expected -1962934272.]



 ERROR: putUtil: put error for /FranceGrillesZone/home/4p/qualification/3196/14368.tar, status = -1 status = -1 Unknown iRODS error, Operation not permitted

 ERROR: [-] /irods_source/lib/core/src/rcConnect.cpp:284:int rcDisconnect(rcComm_t *) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [failed to call 'client stop']

[-] /irods_source/lib/core/src/sockComm.cpp:129:irods::error sockClientStop(irods::network_object_ptr, rodsEnv *) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [failed to call 'client stop']

[-] /irods_source/plugins/network/src/ssl.cpp:593:irods::error ssl_client_stop(irods::plugin_context &, rodsEnv *) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [error shutting down the SSL connection | error:140E0197:SSL routines:SSL_shutdown:shutdown while in init]
```

Here the result of the ils -L command 

```
  4p                0 phenome;phenome-random;dc-pole2;dc-idf-irods1b            0 2025-01-22.10:20 ? 14368.tar
        generic    /mnt/lv0b/home/4p/qualification/3196/14368.tar
```

Have you an idea on how to resolve this issue ? 

Thanks you
Best regards,

COLIN Renaud (PHENOME INRAE France)

jc...@sanger.ac.uk

unread,
Jan 28, 2025, 9:15:42 AM1/28/25
to iRODS-Chat
Hey Colin,


Best of Luck!

John

colin r

unread,
Jan 28, 2025, 9:49:34 AM1/28/25
to iRODS-Chat
Hi John, thanks for your answer,

To be sure to well understand, on the client i should have something like this ? (Or either on the server side environnement)

```
irods_tcp_keepalive_probes : 9
irods_tcp_keepalive_time_in_seconds :7200
irods_tcp_keepalive_intvl_in_seconds: 75
```

Thanks

Alan King

unread,
Jan 28, 2025, 11:05:04 AM1/28/25
to irod...@googlegroups.com
Hi Colin,

Yes, those are the values to use in the client environment. To be clear, the values you have listed there are the default values, so you should use different values if you're having trouble. If the problem is happening on replication, you will need to set those values in the client environment for the server's service account rodsadmin (i.e. /var/lib/irods/.irods/irods_environment.json). This will affect keepalive behavior in server-to-server communications. As shown in the docs linked by John, the default behavior will keep the connection alive for about 2 hours and 11 minutes, so if it takes longer than that to complete the transfer, you will want to increase the value for irods_tcp_keepalive_time_in_seconds.

The upload to the primary server seems to be OK (100% done) but then,  an error occurs and the file is not replicated, and is marked as an intermediate state in the catalog.

It looks like the object is stuck in the intermediate state. No matter what else happens, the object should be restored to an at-rest status (good or stale). In this case, it did not, so this is a bug. Please report this bug by creating an issue on GitHub: https://github.com/irods/irods/issues

Given the ils output, it looks like the replication did not even get to creating a new replica in the catalog. I'm guessing that something went wrong while finalizing the data object before beginning the replication.

If you have access to the server's logs (i.e. /var/log/irods/irods.log, or syslog), please share redacted / relevant output. This should give us more insight into what may have happened on the server to augment what we are seeing on the client side.

As for fixing the immediate problem, you can use iadmin modrepl to change the status of the replica so that it's not in the intermediate state. See Troubleshooting docs for detailed instructions: https://docs.irods.org/4.3.3/system_overview/troubleshooting/#data-object-stuck-in-locked-or-intermediate-status

Alan

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/deaf2b9c-b823-4fc2-abcf-c1d971a5d4efn%40googlegroups.com.


--
Alan King
Senior Software Developer | iRODS Consortium
Reply all
Reply to author
Forward
0 new messages