irm command is showing an error. Why?

1,788 views
Skip to first unread message

Mary

unread,
Feb 18, 2022, 7:09:04 AM2/18/22
to iRODS-Chat
Hello,

I' m wondering why irm command is showing an error. An example:
$ nano testing9.txt
$ iput testing9.txt
$ irm testing9.txt

remote addresses: <irods_server_ip_address> ERROR: rmUtil: rm error for /<zone_name>/home/<username>/testing9.txt, status = -305113 status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Remote resource may be unavailable.

Do you maybe know the reason?

Thanks.


Kind Regards,
Mary

Terrell Russell

unread,
Feb 18, 2022, 10:06:17 AM2/18/22
to irod...@googlegroups.com
Hi Mary,

This seems like a networking issue from what you've shared so far.

Is this repeatable?

Can you still see testing9.txt via `ils -L testing9.txt`?

Is there anything interesting in the rodsLog?

Terrell




--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/4bd68a73-adc1-4f09-a3e6-b763c4172e28n%40googlegroups.com.

Mary

unread,
Feb 22, 2022, 9:02:49 AM2/22/22
to iRODS-Chat
Hi Terrell,

  • Yes, this is repeatable.
  • No. After removing the file by using $ irm testing9.txt, I cannot see this file. That means, the file is removed, although I am seeing the above error.
  • In rodsLog, I see the following:

With $ iput testing9.txt , I see:
​Feb 22 14:17:42 pid:5436 NOTICE: connectToRhost: connect to host <TSM_Server_FQDN> on port 1247 failed status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:17:42 pid:5436 remote addresses: , <resource_server_ip_address>, <irods_server_ip_address> ERROR: _rcConnect: connectToRhost error, server on <TSM_Server_FQDN>:1247 is probably down status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:17:42 pid:5436 WARNING: No replica access token in L1 descriptor. Ignoring replica access table. [path=/<zone_name>/home/<username>/testing9.txt, resource_hierarchy=uniReplResc;uniCompResc;archiveTsmResc]
Feb 22 14:17:45 pid:5436 NOTICE: connectToRhost: connect to host <TSM_Server_FQDN> on port 1247 failed status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:17:45 pid:5436 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: _rcConnect: connectToRhost error, server on <TSM_Server_FQDN>:1247 is probably down status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:17:45 pid:5436 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: [get_size_in_vault:159] - getSizeInVault error [error_code=[-305113], path=[/<zone_name>/home/<username>/testing9.txt], hierarchy=[uniReplResc;uniCompResc;archiveTsmResc]]
Feb 22 14:17:45 pid:5436 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: [rsDataObjClose:794] - [USER_SOCK_CONNECT_ERR: [update_replica_size_and_throw_on_failure:430] - failed to get size in vault [error_code=[-305113], path=[/<zone_name>/home/<username>/testing9.txt], hierarchy=[uniReplResc;uniCompResc;archiveTsmResc]]

]
Feb 22 14:17:45 pid:5436 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: [close_replica] - rsDataObjClose failed with [-305113]

With $ irm testing9.txt, I see:
Feb 22 14:18:38 pid:5493 NOTICE: connectToRhost: connect to host <TSM_Server_FQDN> on port 1247 failed status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:18:38 pid:5493 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: _rcConnect: connectToRhost error, server on <TSM_Server_FQDN>:1247 is probably down status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:18:41 pid:5493 NOTICE: connectToRhost: connect to host <TSM_Server_FQDN> on port 1247 failed status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:18:41 pid:5493 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: _rcConnect: connectToRhost error, server on <TSM_Server_FQDN>:1247 is probably down status = -305113 USER_SOCK_CONNECT_ERR, No route to host
Feb 22 14:18:41 pid:5493 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: syncDataObjPhyPathS:rsFileRename from /<zone_name>/home/<username>/testing9.txt to /irods/archiveTsmVault/trash/home/<username>/testing9.txt.928482589 failed,status=-305113
Feb 22 14:18:41 pid:5493 remote addresses: <resource_server_ip_address>, <irods_server_ip_address> ERROR: rsMvDataObjToTrash: rcDataObjRename error for /<zone_name>/trash/home/<username>/testing9.txt.928482589, status = -305113


I wonder why I' m getting such errors. Resources that I'm using are in the attachment.


Regards
Mary
resources.png

Alan King

unread,
Feb 22, 2022, 12:44:44 PM2/22/22
to irod...@googlegroups.com
Hi,

It looks like it has trouble connecting with the server attached to the resource `archiveTsmResc`. In this case, the failure appears to be occurring in the sync to archive. So, please share the output of `ils -l /<zone_name>/home/<username>/testing9.txt` (replacing the appropriate pieces with a valid path, of course). I'm interested to see if the replica on `archiveTsmResc` is marked good or stale. If it's possible to do so, it may be worth confirming that the file exists on the storage after the `iput` invocation. It's possible that the file never gets created to begin with.

The `irm` error is likely the same cause: the server attached to `archiveTsmResc` is unreachable, so the unlink operation is failing to redirect there to unlink the physical file.




--
Alan King
Senior Software Developer | iRODS Consortium

Mary

unread,
Feb 22, 2022, 5:23:02 PM2/22/22
to iRODS-Chat
Hi,
  • Output
$ ils -l /<zone_name>/home/<username>/testing9.txt

The output is:
  <username>               0 uniReplResc;uniResc            6 2022-02-22.22:50 & testing9.txt
  <username>               1 uniReplResc;uniCompResc;cacheUniResc            6 2022-02-22.22:50 & testing9.txt
  <username>               2 uniReplResc;uniCompResc;archiveTsmResc            6 2022-02-22.22:50 X testing9.txt


  • Let me also share the commands that I used to create and coordinate resources (replication and compound). This could maybe help to find the cause of the error. Please take a look at  the following commands and let me know if I made a mistake.
$ iadmin mkresc uniResc unixfilesystem '<irods_server_FQDN>':/var/lib/irods/uniVault
$ iadmin mkresc cacheUniResc unixfilesystem '<resource_server_FQDN>':/var/lib/irods/cacheUniVault
$ iadmin mkresc archiveTsmResc univmss '<TSM_Server_FQDN>':/irods/archiveTsmVault dsmarc
$ iadmin mkresc uniCompResc compound
$ iadmin addchildtoresc uniCompResc cacheUniResc cache
$ iadmin addchildtoresc uniCompResc archiveTsmResc archive
$ iadmin mkresc uniReplResc replication
$ iadmin addchildtoresc uniReplResc uniResc
$ iadmin addchildtoresc uniReplResc uniCompResc


Thank you.

Regards,
Mary

Terrell Russell

unread,
Feb 22, 2022, 6:19:20 PM2/22/22
to irod...@googlegroups.com
I think I understand the problem (and the logs we were seeing before).

$ iadmin mkresc archiveTsmResc univmss '<TSM_Server_FQDN>':/irods/archiveTsmVault dsmarc


Where you have TSM_Server_FQDN, you should have resource_server_FQDN - just like the cache resource.  The value of that parameter is the hostname of the iRODS server where the active script/binary must be located safely in the msiExecCmd_bin directory.

The errors in the log are explained now because the TSM_Server_FQDN is not listing on the iRODS port 1247 - which is what the server tried to connect to, and failed.

The 'X' status you see via `ils -l` means that the replica iRODS tried to sync_to_archive to the archive resource did not work, and got set to 'stale'.

Terrell








Mary

unread,
Feb 23, 2022, 8:50:26 AM2/23/22
to iRODS-Chat
Nice to hear that you understand the problem. Thank you.

Many thanks for the explanation. However, I' m not sure if I understand well what I can do to fix that error. I' m sorry for that. Do you maybe mean that I have to use '<resource-server-FQDN>' instead of '<TSM_Server_FQDN>'? This means, FQDN for cache must be the same as FQDN for archive as follows (in green color). Am I right? If so, is there maybe anything else that I need to do to fix the error?
$ iadmin mkresc uniResc unixfilesystem '<irods_server_FQDN>':/var/lib/irods/uniVault
$ iadmin mkresc cacheUniResc unixfilesystem '<resource_server_FQDN>':/var/lib/irods/cacheUniVault
$ iadmin mkresc archiveTsmResc univmss '<resource_server_FQDN>':/irods/archiveTsmVault  dsmarc
$ iadmin mkresc uniCompResc compound
$ iadmin addchildtoresc uniCompResc cacheUniResc cache
$ iadmin addchildtoresc uniCompResc archiveTsmResc archive
$ iadmin mkresc uniReplResc replication
$ iadmin addchildtoresc uniReplResc uniResc
$ iadmin addchildtoresc uniReplResc uniCompResc


Regards
Mary

Terrell Russell

unread,
Feb 23, 2022, 9:21:02 AM2/23/22
to irod...@googlegroups.com
Yes, that is part of it.

The other part is that /var/lib/irods/msiExecCmd_bin/dsmarc needs to be available (and executable) on the <resource_server_FQDN> machine.

If that is not possible for some reason, perhaps we can look into configuring univMSSInterface.sh to do what is necessary.

Terrell

 

Mary

unread,
Feb 23, 2022, 9:52:19 AM2/23/22
to iRODS-Chat
On the resource server machine, if  I run $ ls /var/lib/irods/msiExecCmd_bin/, the output is:
                                     hello  irodsServerMonPerf  test_execstream.py  univMSSInterface.sh.template

Could you please tell me how  to make  /var/lib/irods/msiExecCmd_bin/dsmarc available (and executable) on this machine as you said? How to do that?

Thanks.


Regards,
Mary

Terrell Russell

unread,
Feb 23, 2022, 12:38:02 PM2/23/22
to irod...@googlegroups.com
I haven't done this myself (don't have / haven't used TSM) - but it seems dsmarc needs to be compiled and placed in that directory.


Then, assuming things are configured correctly, it should behave as you expect,

Terrell

 

Mary

unread,
Feb 28, 2022, 6:45:50 AM2/28/22
to iRODS-Chat
Hi,

Thanks.

I have tried. Unfortunately, I'm still seeing errors.  The following is what I have/tried:

 1. Now "dsmarc" is in "msiExecCmd_bin" directory:
$ ls /var/lib/irods/msiExecCmd_bin/
dsmarc  dsmarc.c  hello  include  irodsServerMonPerf  Makefile  test_execstream.py  univMSSInterface.sh.template

2. I created compound ressource:
 $  iadmin mkresc cacheResc13 unixfilesystem 'irods_server_FQDN':/var/lib/irods/Vault13
 $  iadmin mkresc archiveTsmResc13 univmss 'irods_server_FQDN':/var/lib/irods/Vault13 dsmarc
 $  iadmin mkresc compResc13 compound
 $  iadmin addchildtoresc compResc13 cacheResc13 cache
 $  iadmin addchildtoresc compResc13 archiveTsmResc13 archive
 $ ilsresc -l compResc13
resource name: compResc13
id: 10632
zone: uniTestZone
type: compound
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space:
free space time: : Never
status:
info:
comment:
create time: 01646038535: 2022-02-28.09:55:35
modify time: 01646038535: 2022-02-28.09:55:35
context:
parent:
parent context:

3. Upload a file to irods:
$ nano testi.txt
$ iput -R compResc13 testi.txt

After this iput command, I checked in "/var/lib/irods/log" directory and see the message/error below:
 $ nano  /var/lib/irods/log/rodsLog.2022.02.26
...
Feb 28 10:09:13 pid:29533 NOTICE: execCmd:../../var/lib/irods/msiExecCmd_bin/dsmarc argv:mkdir '/var/lib/irods/Vault13/home/alice'
Feb 28 10:09:13 pid:29530 remote addresses: <irods_server_ip_address> ERROR: _rsExecCmd: waitpid status = 29533, myExecCmdOut->status = 0, childStatus = 256
Feb 28 10:09:13 pid:29553 NOTICE: execCmd:../../var/lib/irods/msiExecCmd_bin/dsmarc argv:syncToArch '/var/lib/irods/Vault13/home/alice/testi.txt' '/var/lib/irods/Vault13/home/alice/testi.txt'
Feb 28 10:09:13 pid:29530 remote addresses: <irods_server_ip_address> ERROR: _rsExecCmd: waitpid status = 29553, myExecCmdOut->status = 0, childStatus = 256
Feb 28 10:09:13 pid:29530 remote addresses: <irods_server_ip_address> ERROR: [-] /repos/irods/server/api/src/rsFileSyncToArch.cpp:182:int _rsFileSyncToArch(rsComm_t *, fileStageSyncInp_t *, fileSyncOut_t **) :  status [UNIV_MSS_SYNCTOARCH_ERR]  errno [Transport endpoint is not connected] -- message [fileSyncToArch failed for [/var/lib/irods/Vault13/home/alice/testi.txt]]
[-] /repos/irods/server/drivers/src/fileDriver.cpp:612:irods::error fileSyncToArch(rsComm_t *, irods::first_class_object_ptr, const std::string &) :  status [UNIV_MSS_SYNCTOARCH_ERR]  errno [Transport endpoint is not connected] -- message [failed to call 'synctoarch']
[-] /repos/irods/plugins/resources/univmss/libunivmss.cpp:648:irods::error univ_mss_file_sync_to_arch(irods::plugin_context &, const char *) :  status [UNIV_MSS_SYNCTOARCH_ERR]  errno [Transport endpoint is not connected] -- message [univ_mss_file_sync_to_arch: copy of [/var/lib/irods/Vault13/home/alice/testi.txt] to [/var/lib/irods/Vault13/home/alice/testi.txt] failed.   stdout buff [(nil)]   stderr buff [0x18bc970]  status [-344000]]

Feb 28 10:09:13 pid:29530 WARNING: No replica access token in L1 descriptor. Ignoring replica access table. [path=/uniTestZone/home/alice/testi.txt, resource_hierarchy=compResc13;archiveTsmResc13]
Feb 28 10:09:13 pid:29573 NOTICE: execCmd:../../var/lib/irods/msiExecCmd_bin/dsmarc argv:stat '/var/lib/irods/Vault13/home/alice/testi.txt'
Feb 28 10:09:13 pid:29530 remote addresses: <irods_server_ip_address> ERROR: _rsExecCmd: waitpid status = 29573, myExecCmdOut->status = 0, childStatus = 256
Feb 28 10:09:13 pid:29530 remote addresses:<irods_server_ip_address> ERROR: [get_size_in_vault:159] - getSizeInVault error [error_code=[-555107], path=[/uniTestZone/home/alice/testi.txt], hierarchy=[compResc13;archiveTsmResc13]]
Feb 28 10:09:13 pid:29530 remote addresses: <irods_server_ip_address> ERROR: [rsDataObjClose:794] - [UNIV_MSS_STAT_ERR: [update_replica_size_and_throw_on_failure:430] - failed to get size in vault [error_code=[-555107], path=[/uniTestZone/home/alice/testi.txt], hierarchy=[compResc13;archiveTsmResc13]]

]
Feb 28 10:09:13 pid:29530 remote addresses: <irods_server_ip_address> ERROR: [close_replica] - rsDataObjClose failed with [-555107]

4. Moreover, I' m seeing the following repeated message/error, even when I do nothing with irods:
 $ nano  /var/lib/irods/log/rodsLog.2022.02.26
...
Feb 28 12:11:25 pid:6368 remote addresses: 131.246.121.101, 131.246.121.66 ERROR: [-]        /repos/irods/plugins/resources/compound/libcompound.cpp:365:irods::error compound_start_operation(irods::plugin_property_map &) :  status [SYS_INVALID_INPUT_PARAM]  errno [] -- message [compound resource: invalid number of children [0]]

Feb 28 12:11:55 pid:6403 remote addresses: 131.246.121.101, 131.246.121.66 ERROR: [-]        /repos/irods/plugins/resources/compound/libcompound.cpp:365:irods::error compound_start_operation(irods::plugin_property_map &) :  status [SYS_INVALID_INPUT_PARAM]  errno [] -- message [compound resource: invalid number of children [0]]

Feb 28 12:12:25 pid:6428 remote addresses: 131.246.121.101, 131.246.121.66 ERROR: [-]        /repos/irods/plugins/resources/compound/libcompound.cpp:365:irods::error compound_start_operation(irods::plugin_property_map &) :  status [SYS_INVALID_INPUT_PARAM]  errno [] -- message [compound resource: invalid number of children [0]]

Feb 28 12:12:55 pid:6462 remote addresses: 131.246.121.101, 131.246.121.66 ERROR: [-]        /repos/irods/plugins/resources/compound/libcompound.cpp:365:irods::error compound_start_operation(irods::plugin_property_map &) :  status [SYS_INVALID_INPUT_PARAM]  errno [] -- message [compound resource: invalid number of children [0]]
...



I really don't know how to fix these errors. Am I doing something wrong somewhere?  Has someone succeeded to connect irods and tsm, for example by creating compound resource (cache and archive), so that he/she is able to use archive resource from TSM without a problem/error? If yes, please share how you have achieved this? Thanks in advance. 


Kind Regards,
Mary




Terrell Russell

unread,
Feb 28, 2022, 7:31:42 AM2/28/22
to irod...@googlegroups.com
Hi Mary,

Can you use the dsmarc binary directly to put/get files into your TSM manually?   Remove iRODS from the experiment...

I think this is a dsmarc configuration issue.

Terrell



Mary

unread,
Feb 28, 2022, 2:08:30 PM2/28/22
to iRODS-Chat
Hi Terrell,

Thank You for this message.

I agree with you. The main issue is the configuration of dsmarc. I confirm this because I have installed and configured TSM Client without any problem. And I am able to back up and restore file/data from TSM Server by using dsmc without any problem. Because I' m new in use of TSM, I have used the documentation from some online links to achieve this.

Now the problem is to configure dsmarc. Is/Are there link(s)/ instructions that could help me?  I have googled a lot, unfortunately without success. Honestly, I 'm new so that even a simple task (for example: to put/get files into my TSM) by using dsmarc without any orientation is difficult for me. I don't know what to do, where to start, ... even if I have seen this link: https://github.com/KTH-PDC/irods-dsmarc/blob/master/README .
Any help regarding dsmarc would be appreciated. Thanks.

Kind Regards,
Mary

Terrell Russell

unread,
Feb 28, 2022, 2:26:45 PM2/28/22
to irod...@googlegroups.com
Hi Mary,

I think your best bet is to create an issue in that repository.

How did you find that code?  Why did you decide to use it?

Since it seems that code is a drop-in replacement for univMSS.sh, maybe you can try to configure that as an alternative?

Terrell





Reply all
Reply to author
Forward
0 new messages