irods error

kxk...@gmail.com

unread,

Oct 27, 2020, 3:05:45 PM10/27/20

to iRODS-Chat

Hello,

Our Galaxy server uses irods for storage on Test environment. Galaxy and irods servers are on different hosts. Occasionally, I get this NetworkException in Galaxy server log (I have not been able to re-produce this problem, BTW):

timeout: timed out
File "irods/connection.py", line 91, in recv msg = iRODSMessage.recv(self.socket)
File "irods/message/__init__.py", line 74, in recv rsp_header_size = _recv_message_in_len(sock, 4)
File "irods/message/__init__.py", line 27, in _recv_message_in_len buf = sock.recv(size_left, socket.MSG_WAITALL)

NetworkException: Could not receive server response: timed out
File "galaxy/objectstore/irods.py", line 314, in _data_object_exists self.session.data_objects.get(data_object_path)
File "irods/manager/data_object_manager.py", line 45, in get parent = self.sess.collections.get(irods_dirname(path))
File "irods/manager/collection_manager.py", line 17, in get result = query.one()
File "irods/query.py", line 220, in one results = self.execute() File "irods/query.py", line 174, in execute result_message = conn.recv()
File "irods/connection.py", line 95, in recv raise NetworkException("Could not receive server response: " + str(e))

Looking at irods log, I see the following:

Oct 26 11:41:30 pid:7677 remote addresses: 129.114.58.190 ERROR: [-] /irods/server/core/src/rsApiHandler.cpp:540:int readAndProcClientMsg(rsComm_t *, i nt) : status [SYS_SOCK_READ_ERR] errno [Connection timed out] -- message [failed to call 'read header']

135 [-] /irods/lib/core/src/sockComm.cpp:201:irods::error readMsgHeader(irods::network_object_ptr, msgHeader_t *, struct timeval *) : status [SYS _SOCK_READ_ERR] errno [Connection timed out] -- message [failed to call 'read header']

136 [-] /irods/plugins/network/tcp/libtcp.cpp:190:irods::error tcp_read_msg_header(irods::plugin_context &, void *, struct timeval *) : s tatus [SYS_SOCK_READ_ERR] errno [Connection timed out] -- message [error reading from socket after [0] bytes read]

137 [-] /irods/plugins/network/tcp/libtcp.cpp:71:irods::error tcp_socket_read(int, void *, int, int &, struct timeval *) : status [SYS_SOCK_READ_ERR] errno [Connection timed out] -- message [error reading from socket after [0] bytes read]

138

139 Oct 26 11:41:30 pid:7677 ERROR: Agent [7677] exiting with status = -116110

140 Oct 26 11:41:30 pid:24472 ERROR: Agent process [7677] exited with status [114]

I ran ierror on -116110 and its SYS_SOCK_READ_ERR. Any idea why this happens on the server side? I see the Agent ERROR prior to this exception being thrown. Not sure if its a cause for this exception or a result.

Any help is greatly appreciated.
-Kaivan

Terrell Russell

unread,

Oct 27, 2020, 3:19:31 PM10/27/20

to irod...@googlegroups.com

Kaivan,

This is with python-irodsclient v0.8.4, which includes your stale connection PR?

https://github.com/irods/python-irodsclient/commit/bfd10d6a813a0a3c00f6da5c21bcb177ad7e242a

Can you add some logging to determine how long connections have been open on the client/Galaxy side? It appears the iRODS server thought nobody was still on the other end.

Terrell

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org

iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/2c067fc5-1f00-4f32-a7d2-b3ccd9bb0344n%40googlegroups.com.

xk302a

unread,

Oct 27, 2020, 3:49:35 PM10/27/20

to irod...@googlegroups.com

Yes, this has the stale connection logic and I still get the error message :(
I'll log connection duration. This requires me to add a create_time to Connection class. I'll do this in my fork (If needed, I can create a PR to merge it to master later)
Thanks Terrell,
-Kaivan

To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/CAFaqteYiUE36z0pyLuchuOE0mwCnO6S0JKt-qTV%2BXeV381rANg%40mail.gmail.com.

dmoore.renci

unread,

Oct 29, 2020, 6:05:09 PM10/29/20

to iRODS-Chat

Kaivan,

I know the existing anti-staleness connection logic marks the return to the pool as the last use time. Depending on the use pattern (are sessions ever held on to for a significant amount of time before cleanup?) a fixed interval threshold for assuming staleness may need to be larger to accommodate the increased error margin. Does a lower threshold cause this error to happen more often?

Dan

kxk...@gmail.com

unread,

Oct 29, 2020, 8:37:21 PM10/29/20

to iRODS-Chat

Thanks Dan,

I found one problem at my end. I was not setting the connection_refresh_time config parameter, hence, it would default to the no-refresh behavior. Fixed that and another issue (Old connections had to be dropped even if they were used recently), and deployed the revised code. Will check the server logs tomorrow to see the error is there or not. Will repot back :)

kxk...@gmail.com

unread,

Oct 30, 2020, 12:09:26 PM10/30/20

to iRODS-Chat

So, I have not seen a timeout for 24 hours, which is very encouraging. I will check the log and if there are no timeouts over the weekend, I will then declare victory.

Terrell, I will be creating a new PR as I've changed the logic to re-create old connections, not not-used connections.

Best,
-Kaivan

Terrell Russell

unread,

Oct 30, 2020, 12:12:26 PM10/30/20

to irod...@googlegroups.com

Tentatively... excellent.

Terrell

To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/c5724811-d6d9-468a-b233-c803572c5145n%40googlegroups.com.

kxk...@gmail.com

unread,

Nov 2, 2020, 11:56:56 AM11/2/20

to iRODS-Chat

No errors over the weekend. I think re-creating old connections will solve this issue. Will create a PR soon. Thanks.

Terrell Russell

unread,

Nov 2, 2020, 12:13:50 PM11/2/20

to irod...@googlegroups.com

Gotcha - thanks.

Terrell

To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/7ce919c9-e7a4-4b82-bf28-449e4c584cc5n%40googlegroups.com.

Reply all

Reply to author

Forward