Resource server very slow

251 views
Skip to first unread message

Maria Hauser

unread,
Feb 18, 2015, 12:15:58 PM2/18/15
to irod...@googlegroups.com
Hi all,

I am facing the problem that a resource server is orders of magnitude slower than a ICAT server. I cannot explain that just with the network latency times.

I have two servers: an ICAT server located in Germany (here called de.com) and a resource server located in the US (us.com). I have two resources, the first deResc (unix file system of the German server) and the second usResc (unix file system of the US server).

I am testing the execution times of three operations: iput, irm and ils. The size of the file for iput is 2.4 MB.

I get following results. I am logged in on the server germany.com with the combination in the .irods/.irodsEnv
irodsHost 'de.com'
irodsDefResource 'deResc'
In this case, the times for all three operations are much less than 1 second.

In contrast, when I am logged in on the remote US server us.com and try the different irodsHos/irodsDefResource combinations, all operations become much slower. What surprises me most is that the fastest combination still is the combination irodsHost 'de.com', irodsDefResource 'deResc' (that is, both remote related to the server I am logged in to). The full time measurements with the "time" command:

irodsHost 'de.com'
irodsDefResource 'deResc'
 
time iput testfile
real    0m4.787s

time irm testfile
real    0m1.654s

time ils
real    0m2.244s

------------------------------
----------
irodsHost 'de.com'
irodsDefResource 'usResc'

time iput testfile
real    0m16.860s

time irm testfile
real    0m6.818s


time ils > /dev/null
real    0m2.297s

----------------------------------------
irodsHost 'us.com'
irodsDefResource 'deResc'

time iput testfile
real    0m14.800s

time irm testfile
real    0m15.523s

time ils > /dev/null
real    0m6.889s

----------------------------------------
irodsHost 'us.com'
irodsDefResource 'usResc'

time iput testfile
real    0m9.038s

time irm testfile
real    0m22.057s

time ils > /dev/null
real    0m6.526s

The time for irm with both irodsHost and irodsDefResource on the US server reaches 22 seconds! Ping just needs 130 ms.

Can someone please help me with that? is something wrong with my settings on some of the servers?

Thanks,
Maria

Adil Hasan

unread,
Feb 18, 2015, 12:25:04 PM2/18/15
to irod...@googlegroups.com
Hallo Maria,
I wonder if it's the fact that the catalogue is in Germany. Your ils
seems to take 6 secs from the US and 2secs in Germany. So, I guess
perhaps that some of the lookups that need to happen for iput, iget
are much more heavily impacted although perhaps not that much.
When you run your commands from the US do you have as your irodsHost
the US host or the German one? I'm not sure what the connection is
like between servers. You could try setting your irodsHost in your
.irodsEnv to the other site to see if that helps. It seems a bit odd.
hth
adil
> --
> --
> "iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution" https://www.irods.org
>
> iROD-Chat: http://groups.google.com/group/iROD-Chat
>
> ---
> You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Maria Hauser

unread,
Feb 19, 2015, 4:15:53 AM2/19/15
to irod...@googlegroups.com
Hi Adil,

all the times in my first message happen when I run the commands from the US server. I only change irodsHost and irodsDefResource in the .irodsEnv file.

So it would be logical for me that at least iput and irm are fastest when I set irodsHost to us.com and irodsDefResource to usResc. But the exact opposite is the case. The fastest combination is irodsHost 'de.com' and irodsDefResource 'deResc'.

Apart from it, I think such long execution times are not normal? The transfer of the test file with scp takes 2-3 seconds both us->de and de->us (plus the time for logging in).

Best,
Maria

Jean-Yves Nief

unread,
Feb 19, 2015, 4:36:30 AM2/19/15
to irod...@googlegroups.com
hello Maria,

see inline comments:

Maria Hauser wrote:
> Hi all,
>
> I am facing the problem that a resource server is orders of magnitude
> slower than a ICAT server. I cannot explain that just with the network
> latency times.
>
> I have two servers: an ICAT server located in Germany (here called
> de.com <http://de.com>) and a resource server located in the US
> (us.com <http://us.com>). I have two resources, the first deResc (unix
> file system of the German server) and the second usResc (unix file
> system of the US server).
>
> I am testing the execution times of three operations: iput, irm and
> ils. The size of the file for iput is 2.4 MB.
>
> I get following results. I am logged in on the server germany.com
> <http://germany.com> with the combination in the .irods/.irodsEnv
> irodsHost 'de.com <http://de.com>'
> irodsDefResource 'deResc'
> In this case, the times for all three operations are much less than 1
> second.
>
> In contrast, when I am logged in on the remote US server us.com
> <http://us.com> and try the different irodsHos/irodsDefResource
> combinations, all operations become much slower. What surprises me
> most is that the fastest combination still is the combination
> irodsHost 'de.com <http://de.com>', irodsDefResource 'deResc' (that
> is, both remote related to the server I am logged in to). The full
> time measurements with the "time" command:
>
> irodsHost 'de.com <http://de.com>'
> irodsDefResource 'deResc'
>
> time iput testfile
> real 0m4.787s
>
> time irm testfile
> real 0m1.654s
>
> time ils
> real 0m2.244s
>
here all the operations are "local" (client, iCAT server and phys
resource in Germany), so that's the ideal case from the performance
point of view.
> ------------------------------
> ----------
> irodsHost 'de.com <http://de.com>'
> irodsDefResource 'usResc'
>
> time iput testfile
> real 0m16.860s
>
> time irm testfile
> real 0m6.818s
>
>
> time ils > /dev/null
> real 0m2.297s
ils operation are still made locally, this is why it is ok. irm is also
ok as it is also a "local" operation (if you use the trash can which is
the default setting, the file is not physically removed from the US
resource, it is just moved in the trash can collection, hence it is only
a db operation which is made on the iCAT side).
iput is clearly slow: at this point either a bad RTT (but not only:
can't explain such a result, US is not on planet Mars either, use ping
<hostname> to measure the RTT between your place in Germany and your
server in the US) or a low connectivity somewhere between deResc and
usResc, most likely very closed to the usResc, or one other solution,
the network is bad at some point and lots of packets are being lost
(cure: TCP window size have to be changed).
>
> ----------------------------------------
> irodsHost 'us.com <http://us.com>'
> irodsDefResource 'deResc'
>
> time iput testfile
> real 0m14.800s
>
> time irm testfile
> real 0m15.523s
>
> time ils > /dev/null
> real 0m6.889s
iput perf consistent with previous case. irm very slow which might
invalidate my assumption above that you are using the trash can. Note
that in this case, each of your icommand connects to the usResc but as
this one is only a phys rescource, it checks with the iCAT in
Deutschland for the authentication (so an other overhead due to the
connection between usResc and deResc). Note also the file transfer will
be the following:
- german client -> usResc -> deResc
>
> ----------------------------------------
> irodsHost 'us.com <http://us.com>'
> irodsDefResource 'usResc'
>
> time iput testfile
> real 0m9.038s
>
> time irm testfile
> real 0m22.057s
>
> time ils > /dev/null
> real 0m6.526s
for iput, better performance than the previous use case as your file
does not go back and forth across the Atlantic.
These are some hints, but there are inconsistencies between some
possible explanations that I mentioned and your results.
You should also check the load on the servers on both side. irm
operation are overall very slow (the trash can policy has to be checked:
core.re file), iCAT db performances as well.
cheers,
JY
>
> The time for irm with both irodsHost and irodsDefResource on the US
> server reaches 22 seconds! Ping just needs 130 ms.
>
> Can someone please help me with that? is something wrong with my
> settings on some of the servers?
>
> Thanks,
> Maria
> --
> --
> "iRODS: the Integrated Rule-Oriented Data-management System; A
> community driven, open source, data grid software solution"
> https://www.irods.org
>
> iROD-Chat: http://groups.google.com/group/iROD-Chat
>
> ---
> You received this message because you are subscribed to the Google
> Groups "iRODS-Chat" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to irod-chat+...@googlegroups.com
> <mailto:irod-chat+...@googlegroups.com>.

Maria Hauser

unread,
Feb 19, 2015, 8:18:49 AM2/19/15
to irod...@googlegroups.com
Hi Jean-Yves,

thanks for your comments.

First, a question - what do you mean by "client"? The computer that is the "irodsHost" in the .irodsEnv file or the client where I run the commands from? Generally, what is the role of "irodsHost" exactly? Does a connection go to the irodsHost every time when I execute a command before doing all other necessary connections (i. e. to the iCAT server etc)?

Another question: is there a way to check exactly what happens when I iput a file for example? Like what servers my client connects to, what are the answer times etc? Something like a debug mode.

Inline comments follow:


On Thursday, February 19, 2015 at 10:36:30 AM UTC+1, Jean-Yves Nief wrote:
>
> irodsHost 'de.com <http://de.com>'
> irodsDefResource 'deResc'
>
> time iput testfile
> real    0m4.787s
>
> time irm testfile
> real    0m1.654s
>
> time ils
> real    0m2.244s
>
here all the operations are "local" (client, iCAT server and phys
resource in Germany), so that's the ideal case from the performance
point of view.

If you mean irodsHost by "client", then yes. If you mean my console by "client", than no - I am executing all my commands on the US server (see my first email). So here, I would need a connection to Germany and back each time I am executing a command. iput also has to send the file to Germany from the US, since the file is on the US server and has to be stored to deResc.

Ping needs ~130ms de->us and us->de both, without any packet loss. icp needs 2-3 seconds, de->us and us->de both (but just for copying the file, without the time for logging in). So I don't think here is a general network connection problem.

 
> ------------------------------
> ----------
> irodsHost 'de.com <http://de.com>'
> irodsDefResource 'usResc'
>
> time iput testfile
> real    0m16.860s
>
> time irm testfile
> real    0m6.818s
>
>
> time ils > /dev/null
> real    0m2.297s
ils operation are still made locally, this is why it is ok. irm is also
ok as it is also a "local" operation (if you use the trash can which is
the default setting, the file is not physically removed from the US
resource, it is just moved in the trash can collection, hence it is only
a db operation which is made on the iCAT side).

The deletion policy is left at default (acTrashPolicy rule is always empty), both in Germany and US. I think irm should take approximately the same time as in the example before? However, it is like 5 times slower. (And I have to say, the times I see here stay pretty much the same when running the same command multiple times).

> ----------------------------------------
> irodsHost 'us.com <http://us.com>'
> irodsDefResource 'deResc'
>
> time iput testfile
> real    0m14.800s
>
> time irm testfile
> real    0m15.523s
>
> time ils > /dev/null
> real    0m6.889s
iput perf consistent with previous case. irm very slow which might
invalidate my assumption above that you are using the trash can. Note
that in this case, each of your icommand connects to the usResc but as
this one is only a phys rescource, it checks with the iCAT in
Deutschland for the authentication (so an other overhead due to the
connection between usResc and deResc). Note also the file transfer will
be the following:
- german client -> usResc -> deResc

This is why I asked what you mean by the client. Since although I am sitting in Germany, I am logged in on the US server and run all my commands from there. So the file transfer should be like US client -> deResc? How can a file pass two resources though? Is it not stored directly from my client hard disc to the resource?
 
>
> ----------------------------------------
> irodsHost 'us.com <http://us.com>'
> irodsDefResource 'usResc'
>
> time iput testfile
> real    0m9.038s
>
> time irm testfile
> real    0m22.057s
>
> time ils > /dev/null
> real    0m6.526s
for iput, better performance than the previous use case as your file
does not go back and forth across the Atlantic.

Here, my file to store is on the US server and it is stored to the usResc (the same hard disc). So I would think that it is stored directly there and is not sent anywhere. The only thing that should be sent are the other data (authentication, iCAT update etc). And that should be pretty fast. Do I understand something wrong?
 
These are some hints, but there are inconsistencies between some
possible explanations that I mentioned and your results.
You should also check the load on the servers on both side. irm
operation are overall very slow (the trash can policy has to be checked:
core.re file), iCAT db performances as well.

But it is only the case from the US. When I do all the operations locally here in Germany, I get very fast execution times, below 1 second for each operation.

Can it be that I some strange configuration on the resource US server causes my data to be sent back and forth multiple times?

Best,
Maria
 

Jean-Yves Nief

unread,
Feb 19, 2015, 10:43:52 AM2/19/15
to irod...@googlegroups.com
Maria Hauser wrote:
> Hi Jean-Yves,
>
> thanks for your comments.
>
> First, a question - what do you mean by "client"?
the process which is connecting to the server: in your case, the icommands.
> The computer that is the "irodsHost" in the .irodsEnv file or the
> client where I run the commands from? Generally, what is the role of
> "irodsHost" exactly?
that's the name of the server where every connection attempts to the
service will go.
> Does a connection go to the irodsHost every time when I execute a
> command before doing all other necessary connections (i. e. to the
> iCAT server etc)?
yes
>
> Another question: is there a way to check exactly what happens when I
> iput a file for example? Like what servers my client connects to, what
> are the answer times etc? Something like a debug mode.
none that I know of. You can use "netstat" and "lsof" on the various
nodes to track the connections to the different but that means you need
to attach the debugger to the icommand you are running because you won't
have enough time to see anything. But the easiest thing to do is to look
at the log files in server/log on all your servers and that will give
you an idea of what's going on.
hope this help,
JY
>
> Inline comments follow:
>
> On Thursday, February 19, 2015 at 10:36:30 AM UTC+1, Jean-Yves Nief wrote:
>
> >
> > irodsHost 'de.com <http://de.com> <http://de.com>'
> > irodsDefResource 'deResc'
> >
> > time iput testfile
> > real 0m4.787s
> >
> > time irm testfile
> > real 0m1.654s
> >
> > time ils
> > real 0m2.244s
> >
> here all the operations are "local" (client, iCAT server and phys
> resource in Germany), so that's the ideal case from the performance
> point of view.
>
>
> If you mean irodsHost by "client", then yes. If you mean my console by
> "client", than no - I am executing all my commands on the US server
> (see my first email). So here, I would need a connection to Germany
> and back each time I am executing a command. iput also has to send the
> file to Germany from the US, since the file is on the US server and
> has to be stored to deResc.
>
> Ping needs ~130ms de->us and us->de both, without any packet loss. icp
> needs 2-3 seconds, de->us and us->de both (but just for copying the
> file, without the time for logging in). So I don't think here is a
> general network connection problem.
>
>
> > ------------------------------
> > ----------
> > irodsHost 'de.com <http://de.com> <http://de.com>'
> > irodsDefResource 'usResc'
> >
> > time iput testfile
> > real 0m16.860s
> >
> > time irm testfile
> > real 0m6.818s
> >
> >
> > time ils > /dev/null
> > real 0m2.297s
> ils operation are still made locally, this is why it is ok. irm is
> also
> ok as it is also a "local" operation (if you use the trash can
> which is
> the default setting, the file is not physically removed from the US
> resource, it is just moved in the trash can collection, hence it
> is only
> a db operation which is made on the iCAT side).
>
>
> The deletion policy is left at default (acTrashPolicy rule is always
> empty), both in Germany and US. I think irm should take approximately
> the same time as in the example before? However, it is like 5 times
> slower. (And I have to say, the times I see here stay pretty much the
> same when running the same command multiple times).
>
> > ----------------------------------------
> > irodsHost 'us.com <http://us.com> <http://us.com>'
> > irodsDefResource 'deResc'
> >
> > time iput testfile
> > real 0m14.800s
> >
> > time irm testfile
> > real 0m15.523s
> >
> > time ils > /dev/null
> > real 0m6.889s
> iput perf consistent with previous case. irm very slow which might
> invalidate my assumption above that you are using the trash can. Note
> that in this case, each of your icommand connects to the usResc
> but as
> this one is only a phys rescource, it checks with the iCAT in
> Deutschland for the authentication (so an other overhead due to the
> connection between usResc and deResc). Note also the file transfer
> will
> be the following:
> - german client -> usResc -> deResc
>
>
> This is why I asked what you mean by the client. Since although I am
> sitting in Germany, I am logged in on the US server and run all my
> commands from there. So the file transfer should be like US client ->
> deResc? How can a file pass two resources though? Is it not stored
> directly from my client hard disc to the resource?
>
> >
> > ----------------------------------------
> > irodsHost 'us.com <http://us.com> <http://us.com>'
> > irodsDefResource 'usResc'
> >
> > time iput testfile
> > real 0m9.038s
> >
> > time irm testfile
> > real 0m22.057s
> >
> > time ils > /dev/null
> > real 0m6.526s
> for iput, better performance than the previous use case as your file
> does not go back and forth across the Atlantic.
>
>
> Here, my file to store is on the US server and it is stored to the
> usResc (the same hard disc). So I would think that it is stored
> directly there and is not sent anywhere. The only thing that should be
> sent are the other data (authentication, iCAT update etc). And that
> should be pretty fast. Do I understand something wrong?
>
> These are some hints, but there are inconsistencies between some
> possible explanations that I mentioned and your results.
> You should also check the load on the servers on both side. irm
> operation are overall very slow (the trash can policy has to be
> checked:
> core.re <http://core.re> file), iCAT db performances as well.
>
>
> But it is only the case from the US. When I do all the operations
> locally here in Germany, I get very fast execution times, below 1
> second for each operation.
>
> Can it be that I some strange configuration on the resource US server
> causes my data to be sent back and forth multiple times?
>
> Best,
> Maria

Maria Hauser

unread,
Feb 19, 2015, 10:56:41 AM2/19/15
to irod...@googlegroups.com
Hi Jean-Yves,

I think you have overseen the half of my previous email where I answered a lot of stuff inline. Can you please take a look on it? Thanks!

I've set irodsLogLevel=9 but did not see anything helpful.

I looked into the log files on the servers. The only suspicios thing I see on the US server is:
Feb 19 08:47:16 pid:23172 ERROR: [-]    iRODS/server/core/src/rsApiHandler.cpp:484:readAndProcClientMsg :  status [SYS_HEADER_READ_LEN_ERR]  errno [No such file or directory] -- message []
        [-]     iRODS/lib/core/src/sockComm.cpp:195:readMsgHeader :  status [SYS_HEADER_READ_LEN_ERR]  errno [No such file or directory] -- message [failed to call 'read header']
                [-]     libtcp.cpp:240:tcp_read_msg_header :  status [SYS_HEADER_READ_LEN_ERR]  errno [No such file or directory] -- message [read 0 expected 4]
Feb 19 08:47:16 pid:23172 NOTICE: Agent exiting with status = -4002

But it occured last time today in the morning. And on the german server:

Feb 19 16:47:05 pid:42823 ERROR: [-]    iRODS/server/core/src/rsApiHandler.cpp:484:readAndProcClientMsg :  status [SYS_HEADER_READ_LEN_ERR]  errno [Resource temporarily unavailable] -- message []
        [-]     iRODS/lib/core/src/sockComm.cpp:195:readMsgHeader :  status [SYS_HEADER_READ_LEN_ERR]  errno [Resource temporarily unavailable] -- message [failed to call 'read header']
                [-]     libtcp.cpp:240:tcp_read_msg_header :  status [SYS_HEADER_READ_LEN_ERR]  errno [Resource temporarily unavailable] -- message [read 0 expected 4]

Feb 19 16:47:05 pid:42823 NOTICE: Agent exiting with status = -4011

Best,
Maria

tom.la...@gmail.com

unread,
Feb 19, 2015, 12:03:24 PM2/19/15
to irod...@googlegroups.com
Hi Maria

I am also looking in the performance in irods.
First icat server is always involved in all transactions. I think also that you will get an other preform if you increase the file size.
Only big files goes directly to storage server.
I am guessing that 1gb files is faster on usresource then deresource.
Small files is better to have icat storage resources close to each other.
Regards
Tom

Maria Hauser

unread,
Feb 20, 2015, 6:48:51 AM2/20/15
to irod...@googlegroups.com, tom.la...@gmail.com
Hi all,

thanks Tom for the hint, I found our main error following your advice. Some US users put some files into a "wrong" resource (in Germany) without noticing it, so iput and iget were very slow for these files. It was the main performance issue so far.

I can not explain all issues I listed above with this error (i. e. different irm times). But now I will take a look at it again since it is obvious that I did some time measurements with wrong assumptions.

Best,
Maria

Jean-Yves Nief

unread,
Feb 20, 2015, 9:51:25 AM2/20/15
to irod...@googlegroups.com
hello Maria,

does the error message below appears when you do a iput ?
That should not happen.
cheers,
JY
> > core.re <http://core.re> <http://core.re> file), iCAT db
> performances as well.
> >
> >
> > But it is only the case from the US. When I do all the operations
> > locally here in Germany, I get very fast execution times, below 1
> > second for each operation.
> >
> > Can it be that I some strange configuration on the resource US
> server
> > causes my data to be sent back and forth multiple times?
> >
> > Best,
> > Maria
> > --
> > --
> > "iRODS: the Integrated Rule-Oriented Data-management System; A
> > community driven, open source, data grid software solution"
> > https://www.irods.org
> >
> > iROD-Chat: http://groups.google.com/group/iROD-Chat
> <http://groups.google.com/group/iROD-Chat>
> >
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "iRODS-Chat" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send
> > an email to irod-chat+...@googlegroups.com <javascript:>
> > <mailto:irod-chat+...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.

Maria Hauser

unread,
Feb 24, 2015, 5:37:16 AM2/24/15
to irod...@googlegroups.com
Hi Jean-Yves,

no, this error definitely does not appear every time I do a iput. I tested several commands (ils, iput, irm) and did not get any error. However, I see that this error appears in the log several times a day. There are also other user on the system so I don't know what or who causes this error.

Best,
Maria

Mike Conway

unread,
Feb 24, 2015, 6:44:38 AM2/24/15
to irod...@googlegroups.com
That gives off an aura of a socket time-out.  Especially when paralllel transfer threads error off, or, alternately, when the 1247 connection times out while parallel transfers are going.

Is there a correlation with file size (> 32MB ?)

Cheers
MC
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.

Maria Hauser

unread,
Feb 25, 2015, 4:56:23 AM2/25/15
to irod...@googlegroups.com, mco...@email.unc.edu
Hi Mike,

I cannot see any correlation with iget or iput operations. I tried to reproduce this error from the command line with some standard operations but failed. The error occurs several times a day.

Some people are accessing iRODS through the Jargon API, maybe something there causes the error (but it is not obvious for them what it is apparently).

Best,
Maria

Maria Hauser

unread,
Aug 5, 2015, 10:02:19 AM8/5/15
to iRODS-Chat, mco...@email.unc.edu
Hi guys,

Unfortunately, I have to dig up this thread again. We still have the problem of unbearably slow iRODS resource servers.

At the moment, we have an iCAT enabled server located in Germany, and two resource servers in the US and in Switzerland. The performance of the iCAT server in Germany is fine. Both resource servers in Switzerland and especially in the US are unbearably slow. All of the servers have the current iRODS version 4.1.3.

I did a performance benchmark and measured the time necessary to do ils, and iput of 5 different file sizes: 10 KB, 1 MB, 10 MB, 100 MB and 1 GB. In addition, I changed three parameters: the local icommands host ("local"), iRODS host ("host"), and resource location ("resource"). The resource is the local file system in every case. I measured the time using perf and doing 10 runs, the time in the results below is the average time calculated from all 10 runs.

Here is the performance for setting the same location for all 3 parameters mentioned above (note that the iCAT server is always in Germany):

    ils    iput 10K    iput 1M    iput 10M    iput 100M    iput 1G
"local: DE
host: DE iCAT server
resource: DE"    0.1    0.1    0.1    0.2    1.4    8.2
"local: CH
host: CH resource server
resource: CH"    1.3    1.8    1.8    2    2.1    5.2
"local: US
host: US resource server
resource: US"    6.3    9.1    9.1    9.1    9.3    10.7

Sorry for the bad formatting.

I have the problem here, that the basic response time, i. e. the browsing in Switzerland and especially in the US is very very slow. In the US, I have to wait 6 seconds to get a result of ils containing 5 files, and 9 seconds for iput of a 10K file (where all the time is the response time of the system as it seems to me). To compare, ping from US to Germany needs 130 ms.

If I change the "host" setting to the German iCAT server, I get the following picture:

    ils    iput 10K    iput 1M    iput 10M    iput 100M    iput 1G
"local: CH
host: DE iCAT server
resource: CH"    0.5    1.6    3.3    22.9    2.6    5.3
"local: US
host: DE iCAT server
resource: US"    1.9    7.2    14.1    78.2    10.8    12.8

The response time for ils is much better, but iput times for medium-sized files become very slow - interestingly, only before the parallel file transfer for files > 32 MB is triggered.

I have following questions:

- Why is ils time that slow in the US and in Switzerland if I set iRODS host to the local resource server? I figured out that I cannot solve this problem by setting iRODS host to the iCAT server in Germany - in this case, I slow down the speed of iput for files < 32 MB such that the system becomes unusable.
- Why does the speed of iput depends so much on the iRODS host anyway? I always use the local resource everywhere. I would expect a certain offset for connecting to the iCAT database plus the time for the storing the file locally (which is very fast). Why do I need 78 seconds to iput a 10 MB file in the US to a local disc using the German iCAT server as the iRODS host (second example above)? And why is the offset for iput so large in the US and in the Switzerland in the first example above?
- Generally, what should I do to get a reasonable performance for icommands in the US and in Switzerland?

I would greatly appreciate your help in this matter.

Best,
Maria

Maria Hauser

unread,
Aug 10, 2015, 4:55:47 AM8/10/15
to iRODS-Chat, mco...@email.unc.edu
Hi all,

can somebody help me with this? Our resource servers are unusable in the current state... Thanks!

Best,
Maria

tom.la...@gmail.com

unread,
Aug 10, 2015, 7:50:42 AM8/10/15
to iRODS-Chat, mco...@email.unc.edu
Hi Maria

I will take a closer look to this.
But what happens with big files if you have host to the icat server and iput -N0?
What unit is it in your performance numbers ?

regards
tom

Terrell Russell

unread,
Aug 10, 2015, 2:32:38 PM8/10/15
to irod...@googlegroups.com, mco...@email.unc.edu
Hi Maria,

These numbers can all be explained - There is no reason to think there are any inherent misconfigurations in your setup thus far.



> - Why is ils time that slow in the US and in Switzerland if I set iRODS host to the local resource server? I figured out that I cannot solve this problem by setting iRODS host to the iCAT server in Germany - in this case, I slow down the speed of iput for files < 32 MB such that the system becomes unusable.


If the iCAT server is in Germany, every invocation of an iCommand will be connecting to Germany, including 'ils', which is a database only operation.  If you are connecting to a resource server, you have an authentication handshake (three round trips with basic password authentication) with the resource server... then the resource server redirects the API call to the iCAT server (which does another authentication handshake, ~3 round trips), then the operation, then the results are relayed back to the client via the resource server.



> - Why does the speed of iput depends so much on the iRODS host anyway? I always use the local resource everywhere. I would expect a certain offset for connecting to the iCAT database plus the time for the storing the file locally (which is very fast). Why do I need 78 seconds to iput a 10 MB file in the US to a local disc using the German iCAT server as the iRODS host (second example above)? And why is the offset for iput so large in the US and in the Switzerland in the first example above?


As per Tom's most recent response, the biggest effects you're seeing has to do with the lack of parallel transfer (single buffer put/get) going across the ocean two times.

When you are running an iCommand on the US server, connecting to the DE iCAT server, and placing the (small) file onto the US resource, the file is being sent from the US to Germany and then redirected back to the resource on the US server.  When you 'iput' a large file (>32MB) with the same configuration, the parallel transfer behavior is invoked and a point to point connection is made from the source to the destination, which, in this case, is the same machine and the file is not sent to Germany and back.


This model of single buffer transfers explains all the performance numbers you're seeing.  If you need to change the 'large file' threshold, you can do so by changing 'maximum_size_for_single_buffer_in_megabytes' on all the servers in the Zone.



> - Generally, what should I do to get a reasonable performance for icommands in the US and in Switzerland?


Generally, model your use cases and then configure your users and the Zone to best accommodate them.  If you reduce the 'maximum_size_for_single_buffer_in_megabytes' threshold, and have your users connect directly to the iCAT in all cases, you will see the biggest improvements in overall throughput.  You can test/use parallel transfer directly by using the '-N' option for iget and iput.


Please let us know how it goes!

Terrell




Maria Hauser

unread,
Aug 11, 2015, 12:12:55 PM8/11/15
to iRODS-Chat, mco...@email.unc.edu
Hi Terrell,

thank you for the answer, it was very helpful.

I am trying to tune the settings on the US server. I execute icommands on the US server, have irods host set to the DE iCAT server and use the US resource. Following your advice, I set the environment variable "irods_maximum_size_for_single_buffer_in_megabytes" to 1 in the ~/.irods/irods_environment.json of my iRODS user.

Below is the time performance of the ils and iput commands, similar to the benchmark I posted above. (In brackets are the old times with irods_maximum_size_for_single_buffer_in_megabytes=32 for the reference).


    ils     iput 10K    iput 1M    iput 10M    iput 100M      iput 1G   
    1.9    7.2             14.4           10.2           10.5            12.1
    (1.9)  (7.2)          (14.1)        (78.2)         (10.8)          (12.8)

i.e. the performance for iput 10M gets much better as I would expect, and stays the same elsewhere.

Is there a use case anyway where it makes sense to set irods_transfer_buffer_size_for_parallel_transfer_in_megabytes to a higher value? In other words, what disadvantages does it bring to set it to a lower value? My thought is that it does not make sense to send the data over the ocean if it should be stored locally in any case, independent of the file size.

In general, I would expect some offset for the connection and sending data for the iCAT server and back (e.g. 1.9 seconds observed for ils) and a fast storage of the file on the local disc. If I take the times for the all-local operations in Germany as a reference as listed in the previous post (since in Germany, the time for the storing of the file to the local disc is dominating the overall runtime), and even if I assume 4 seconds instead of 1.9 as the offset for connecting to the iCAT server and sending some information about the iput operation back and forth, I would expect the following picture (assuming point to point transfer is used also for small files):


    ils    iput 10K    iput 1M    iput 10M    iput 100M      iput 1G   
    1.9      4.1            4.1            4.2           5.4              12.2

i.e. I would expect a better iput performance for small and medium size files. Can I obtain a similar performance by some means?

Best,
Maria

Adil Hasan

unread,
Aug 11, 2015, 6:18:31 PM8/11/15
to irod...@googlegroups.com, mco...@email.unc.edu
Hello Maria,
Just some quick suggestions. I'm sure there are reasons that you have
the setup they way that it is, so I may be suggesting some things that
don't make sense to your case (or aren't possible).

As Terrell says you've effectively got a strongly-coupled system over
the wide-area network which will be a challenge to optimise the performance
for (one ICAT and servers in different countries).

One suggestion would be to set-up separate autonomous systems in Germany,
Switzerland and the USA each with their own iCAT and storage servers. You
can then loosly couple them by federating the systems and you can have
people in each country store locally. They can see the data in other
zones (in a read-only manner I'd suggest).
You could then copy data from
one zone to another to improve performance asynchronously (I'd make
these 'import zones' read-only to avoid update conflicts). So, a user
in one country wanting access to data in another could either get access
immediately at a slower rate (it wouldn't be as fast as native storage).
Or, could wait for the data to appear in the 'import zone' and have much
higher transfer rates.

It would add a little complexity to the system by federating and having
users registered in remote zones. But, it would be a bit more autonomous.
Anyway, it's a suggestion and maybe you already thought about it and rejected
it in your case.

hth
adil
> > On Mon, Aug 10, 2015 at 7:50 AM, <tom.la...@gmail.com <javascript:>>
> >> email to irod-chat+...@googlegroups.com <javascript:>.

Terrell Russell

unread,
Aug 11, 2015, 8:36:19 PM8/11/15
to irod...@googlegroups.com
Maria,

You may be able to get better performance with the 'bulk upload' option for iput...

See 'iput -b'



Adil's suggestion does increase the complexity, but may prove interesting if moving the data across the oceans is not a 'common' use case for you.  Every use case is different, and the deployments of iRODS are very varied because of that.

Alternately, replicating the data, either asynchronously or synchronously within a single zone, could reduce the recurrent access times.  If you have an asynchronous replication policy in place, local iputs could be fast, and then eventually be migrated or replicated to the 'proper' location.  This could be automated as much as you can specify.

Terrell





Maria Hauser

unread,
Aug 12, 2015, 9:24:35 AM8/12/15
to iRODS-Chat
Hi Terrell and Adil,

thank you for your suggestions.

I don't think that it would make sense for us to have a different server setup, as Adil suggested, since we need a homogeneously looking file system, where every user can store, delete and modify files without paying attention to where these files are actually stored. But I will discuss it with our developers.

I still would like to aks, for what reason is it not possible to have a certain offset for iput (say 4 seconds) for connecting to the iCAT database in Germany, and then just do a fast local storage of the data on the resource server?

Best,
Maria
...

Maria Hauser

unread,
Aug 17, 2015, 4:39:11 AM8/17/15
to iRODS-Chat
Hi guys,

please can someone answer this question?

"I still would like to aks, for what reason is it not possible to have a certain offset for iput (say 4 seconds) for connecting to the iCAT database in Germany, and then just do a fast local storage of the data on the resource server?"

It would be great for me to understand that.

Thanks,
Maria
...

Jason Coposky

unread,
Aug 18, 2015, 9:37:16 AM8/18/15
to irod...@googlegroups.com

As we have mentioned there are two mechanisms for the transfer of data in iRODS, the 'single buffer' and the 'parallel transfer' mechanisms.  In both instances there are several trips to the catalog for determining access restrictions, registration of a new replica, and then update of that replica's system metadata after the data is at rest, or the transfer has failed in flight.  there is no simple, easy answer as it all depends on the mechanism you use for transfer, which server to which you are connected and where the target resource is hosted.


The iRODS 4.x code base will remain compatible with the 3.x legacy series until our 5.0 release and as such will continue to support the current behavior.  We will address these issues in the creation of our next generation API in parallel and will expose this functionality in new client libraries and a comprehensive command line utility.


thanks,



Jason Coposky
Chief Technologist, iRODS Consortium
RENCI at the University of North Carolina at Chapel Hill
(919)445-9675
jas...@renci.org

irods.org — Take Control of Your Data

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Maria Hauser <fischk...@gmail.com>
Sent: Monday, August 17, 2015 4:39 AM
To: iRODS-Chat
Subject: Re: [iROD-Chat:14202] Resource server very slow
 
--

Maria Hauser

unread,
Aug 18, 2015, 11:21:12 AM8/18/15
to iRODS-Chat, jas...@renci.org
Thank you for your answer Jason.

Another question - is it possible to set the variable irods_maximum_size_for_single_buffer_in_megabytes in the Jargon API?

Best,
Maria
...

Johannes W.

unread,
Aug 20, 2015, 7:06:41 AM8/20/15
to iRODS-Chat, jas...@renci.org
This is also interesting for me. Which would be the equivalent variable in JargonProperty for irods_maximum_size_for_single_
buffer_in_megabytes?

Best, Johannes
...

Mike Conway

unread,
Aug 20, 2015, 8:11:47 AM8/20/15
to irod...@googlegroups.com
There's not really an equivalent, as the decision to do a parallel transfer is on the server side.  Jargon uses a constant for this in ConnectionConstants

public static final long MAX_SZ_FOR_SINGLE_BUF = (32 * 1024 * 1024);

As it's not a useful 'knob' for the client.  One can, from the JargonProperties, turn off the behavior and tune the max threads and various buffers.

MC
--
--
"iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution" https://www.irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat

---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mike Conway Java and Integration Architect - DataNet Federation Consortium GitHub: https://github.com/DICE-UNC LinkedIn: https://www.linkedin.com/pub/mike-conway/5/78a/231
Reply all
Reply to author
Forward
0 new messages