irods clients receiving disconnects

231 views
Skip to first unread message

Edwin Skidmore

unread,
Apr 29, 2011, 9:33:26 AM4/29/11
to irod...@googlegroups.com
Hello there,

For some reason my clients have been experiencing connection losses when
transferring files larger than a few 100MB. A Here is the output from
my systems.

From the client:

[edwin@client ~]$ iput -f -V 1Gfile.dat
NOTICE: irodsHost=irodsserver.iplantcollaborative.org
NOTICE: irodsPort=1247
NOTICE: irodsUserName=edwintest
NOTICE: irodsZone=iplant
NOTICE: created irodsHome=/iplant/home/edwintest
NOTICE: created irodsCwd=/iplant/home/edwintest
From server: NumThreads=8, addr:150.135.x.y, port:20019, cookie=1101858949

ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 645696, errno =
110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out
ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 798912, errno =
110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out
ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 388512, errno =
110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out
ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 388512, errno =
110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out
ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 1077984, errno =
110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out
ERROR: putUtil: put error for /iplant/home/edwintest/1Gfile.dat, status
= -27110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out

From the iCAT server (150.135.x.z):

Apr 28 22:40:15 pid:2729 NOTICE: Agent process 4769 started for
puser=edwintest and cuser=edwintest from 10.0.0.137
Apr 28 22:40:15 pid:4769 NOTICE: rsAuthCheck user edwintest#iplant
Apr 28 22:40:15 pid:4769 NOTICE: rsAuthResponse set proxy authFlag to 3,
client authFlag to 3, user:edwintest#iplant proxy:edwintest client:edwintest
Apr 28 22:40:15 pid:2729 NOTICE: Agent process 4771 started for
puser=rodsadmin and cuser=edwintest from 150.135.x.y
Apr 28 22:40:15 pid:4771 NOTICE: rsAuthCheck user rodsadmin#iplant
Apr 28 22:40:15 pid:4771 NOTICE: rsAuthResponse set proxy authFlag to 5,
client authFlag to 3, user:rodsadmin#iplant proxy:rodsadmin client:edwintest
Apr 28 22:40:15 pid:4771 NOTICE: rsAuthCheck user rodsadmin#iplant
Apr 28 22:40:15 pid:4771 NOTICE: readAndProcClientMsg: received
disconnect msg from client
Apr 28 22:40:15 pid:4771 NOTICE: Agent exiting with status = 0

From the non-iCAT enabled server (150.135.x.y):

Apr 28 22:37:20 pid:13366 NOTICE: Agent process 13466 started for
puser=rodsadmin and cuser=edwintest from 150.135.x.z
Apr 28 22:37:21 pid:13466 NOTICE: Warning, cannot authenticate remote
server, serverResponse field is empty
Apr 28 22:37:21 pid:13466 NOTICE: rsAuthResponse set proxy authFlag to
5, client authFlag to 3, user:rodsadmin#iplant proxy:rodsadmin
client:edwintest

I noticed that there is not a corresponding Agent existing from the
non-iCAT enabled server. Does this imply that the agent is crashing?
Has anyone encountered similar problems? I have also restarted both
iRODS servers to see if that would help, but it doesn't. Here are my
specs for my iRODS setup:

iRODS v2.4.1 (both clients and servers)
iCAT server: redhat 5.6
non-iCAT server: Solaris 10 10/09

Interesting enough, the filesystem shows that the file has the same
number of bytes, but the md5sum from the system reports different
hashes. Also, if I ils, then the iRODS reports that the file is there
with the same number of bytes, but if I attempt an iget, I get an
-317000 USER_INPUT_PATH_ERR (path does not exist) error. It appears
that iRODS is left in an inconsistent state.

Any help would be greatly appreciated.

Thank you,
Edwin

Jean-Yves Nief

unread,
Apr 29, 2011, 11:08:39 AM4/29/11
to irod...@googlegroups.com, Edwin Skidmore
Edwin,

I have seen some "connection timed out" issues on a flaky
network. But it looks like that you are doing your testing on your local
site, so that should not be a problem. Anyway, you can try the '-T' option.
As for the USER_INPUT_PATH_ERR error, I do not have much idea. For some
reason, the file path you are writing to does not seem reachable: fs
mount problem, ACL setting on the filesystem tree ?
cheers,
JY

mw...@diceresearch.org

unread,
Apr 29, 2011, 12:31:44 PM4/29/11
to irod...@googlegroups.com
Hello Edwin,

The timeout came from the network and not in iRODS. There is not much can be done unless
you can convince your network admin to change the network timeout.
We have just added a "large file restart" option to iput which may help. It will be in the next release.

Mike

--
"iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution" https://www.irods.org

iROD-Chat: http://groups.google.com/group/iROD-Chat

Edwin Skidmore

unread,
Apr 29, 2011, 12:39:11 PM4/29/11
to irod...@googlegroups.com, Jean-Yves Nief
I tried the -T option and I can see the connection resets; however, even
with a smaller test file >32M, the transfer fails:

[edwin@client ~]$ ls -laF 32Mb-file.dat
-rw-rw-r-- 1 edwin edwin 34171904 Apr 29 09:20 32Mb-file.dat
[edwin@client ~]$ iput -V -T -f 32Mb-file.dat


NOTICE: irodsHost=irodsserver.iplantcollaborative.org
NOTICE: irodsPort=1247
NOTICE: irodsUserName=edwintest
NOTICE: irodsZone=iplant
NOTICE: created irodsHome=/iplant/home/edwintest
NOTICE: created irodsCwd=/iplant/home/edwintest

From server: NumThreads=2, addr:150.135.x.y, port:20163, cookie=1064427653
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
The client/server socket connection has been renewed
ERROR: rcPartialDataPut: toWrite 4194304, bytesWritten 1658016, errno =

110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out

ERROR: putUtil: put error for /iplant/home/edwintest/32Mb-file.dat,

status = -27110 status = -27110 SYS_COPY_LEN_ERR, Connection timed out

As for the filesystem, I can safely iput a file smaller than 32M. Below
is an example of a transfer of a 3M file:

[edwin@client bin]$ ls -laF Xnest
-rwxr-xr-x 1 root root 3758848 Nov 17 16:26 Xnest*
[edwin@client bin]$ iput -V -T -f Xnest


NOTICE: irodsHost=irodsserver.iplantcollaborative.org
NOTICE: irodsPort=1247
NOTICE: irodsUserName=edwintest
NOTICE: irodsZone=iplant
NOTICE: created irodsHome=/iplant/home/edwintest
NOTICE: created irodsCwd=/iplant/home/edwintest

Xnest 3.585 MB | 0.909 sec | 0 thr | 3.943 MB/s

So, given my understanding of iRODS, this transfer actually goes through
my iCAT server to the non-IES server to perform the operation (in this
case, everything goes to the same non-IES). So, I would assume this
would exclude any acl permissions or mounting issues. Is that a correct
assumption?

What is also disconcerting is that I can scp the same ~1G file without
interruption from the same client to the resource/non-IES server within
2 minutes (7.6M/s). AFAIK, scp is also relative sensitive to connection
drops... unless irods is more sensitive (?).

Actually, my client is in the same domain but in a different building on
campus.

Thanks,
Edwin


--
Edwin Skidmore
HRH Manager of Systems and Integrated Services
iPlant Collaborative

Edwin Skidmore

unread,
Apr 29, 2011, 2:04:46 PM4/29/11
to irod...@googlegroups.com, mw...@diceresearch.org
Hi Mike,

How large of a network timeout would you suggest?

Also, if it were a network timeout, would there be a corresponding "Agent exiting" message in the log (probably with a different status)?

Thank you,
Edwin

mw...@diceresearch.org

unread,
Apr 29, 2011, 2:49:30 PM4/29/11
to irod...@googlegroups.com
Hello Edwin,

The timeout should last longer than the time you think will take to finish your largest transfer.
i don't see the point of using a short timeout in a network.

I'll look into the irodsAgent exit status.
Reply all
Reply to author
Forward
0 new messages