compound resc, stale copy, iunreg on v4.2.10

65 views
Skip to first unread message

Jean-Yves Nief

unread,
Oct 6, 2021, 11:25:20 AM10/6/21
to irod...@googlegroups.com
hello,

            I have a compound resource (let's call it "Cmp") on 4.2.10
containing a unixfilesystem cache resource (let's call it "disk") and an
univmss archive resource ("tape"), auto_repl is "on" (that's a detail,
it does not change what I describe below).
Due to a configuration bug, the replication to the "tape" resource failed.
In such cases, the usual behavior I used to notice was that no replica
on "tape" was registered in iRODS.
Here is what I see:
$ ils -L foo
  foouser           0 Cmp;disk        12690 2021-10-06.12:04 & foo
        generic    /irods/.../foo
  foouser           1 Cmp;tape     12690 2021-10-06.12:04 X foo
        generic    /archive/.../foo
The copy on "tape" is marked in stale mode. At least, that's fine
because it won't be possible to trim the data on "disk" as it could be a
disaster because the copy on "tape" actually does not physically exist.
But it causes an other problem. Even if I do a "iput -f foo", which is a
crude way of trying to recover the situation, the stale copy on
"archive" remains. Hence, cleanup with itrim or iunreg is needed.
It would be better in such situation (the server acknowledged that the
replication did not work properly, it can be seen in the logs), that no
entry appears in the iRODS catalog.
Now in order to clean up this, I use the new iunreg command.
It looks like that:
> iunreg -S "Cmp;tape" foo
remote addresses: xxx ERROR: trimUtil: trim error for /.../foo. status =
-78000 SYS_RESC_DOES_NOT_EXIST
does not work. Is my syntax wrong ? Or is it not possible to unregister
files within a child resource ?
also:
> iunreg -N1 foo
remote addresses: xxx ERROR: trimUtil: trim error for /.../foo. status =
-355000 CANT_UNREG_IN_VAULT_FILE
Total size trimmed = 0.000 MB. Number of files trimmed = 0.
Level 0: Specifying a minimum number of replicas to keep is deprecated.
The '-N' option is deprecated as for itrim (I guess it is picking up
some part of the same code).
A "iunreg -n1 foo" works, eventhough it is more convenient to work with
the "-S" option.
I have noticed also that "iunreg" does not seem to work for "rodsuser",
only for "rodsadmin". If that is really the case, I am fine with that
but it should be mention in the help text.
thanks!
JY

Alan King

unread,
Oct 6, 2021, 5:19:34 PM10/6/21
to irod...@googlegroups.com
Hi,

Let me see if I can address everything satisfactorily through the different points...

Due to a configuration bug, the replication to the "tape" resource failed.
In such cases, the usual behavior I used to notice was that no replica
on "tape" was registered in iRODS.

I think that iRODS leaving a stale replica in the catalog in this case is a reasonable expectation depending on the nature of the failure. If something went wrong while creating the catalog entry or opening the physical file on disk (if applicable), the replica will be unlinked and unregistered from the catalog (again, if applicable). If the creation of the replica in the catalog is successful but there is a failure in the data movement or finalizing the replica in the catalog, I would expect it to remain in the catalog with a stale state (as you observed). Plus, removing the entry from the catalog on failure can be configured as policy for this resource if that is the desired behavior. This is the justification for this change, but we are willing to hear use cases and reconsider as appropriate!

Even if I do a "iput -f foo", which is a
crude way of trying to recover the situation, the stale copy on
"archive" remains.

It is surprising to me that the  replica in tape is not updating. Is the sync to archive operation failing because of the configuration error you mentioned, or is the replication supposed to be working?

 > iunreg -S "Cmp;tape" foo
remote addresses: xxx ERROR: trimUtil: trim error for /.../foo. status =
-78000 SYS_RESC_DOES_NOT_EXIST
does not work. Is my syntax wrong ? Or is it not possible to unregister
files within a child resource ?
...
A "iunreg -n1 foo" works, eventhough it is more convenient to work with
the "-S" option. 

When -S or -n is used with iunreg, it goes into an itrim-like mode. In fact, as you have pointed out, itrim and iunreg are using the API endpoint to accomplish their goals with similar options. At this point in time, -S is only allowed to point to a root resource. We have had discussions about itrim (and by extension iunreg) being more surgical. We would like to make itrim/iunreg -S require a leaf resource or a full hierarchy (as you tried to do). Unfortunately, for now, it remains a root (or standalone) resource and the replica number is the only way to target a specific replica. Your use case is another argument for the more surgical approach in the future, unless you object :)

I have noticed also that "iunreg" does not seem to work for "rodsuser",
only for "rodsadmin". If that is really the case, I am fine with that
but it should be mention in the help text.

This is the trickiest point. iunreg is meant to act as a counter-part to ireg. It therefore has the same policy configuration as ireg as far as modifying data in vaults - acNoChkFilePathPerm. Further, a normal user can use iunreg to unregister data which does not reside in a Vault. Here is the old help text from irm -U (which is now deprecated in favor of iunreg):

The -U option allows the unregistering of the data object or collection
without deleting the physical file. Normally, a normal user cannot
unregister a data object if the physical file is located in a resource
vault. The acNoChkFilePathPerm rule allows this check to be bypassed.

Unwritten here is that a normal user can unregister data objects whose replicas do not reside in a resource Vault. Here is the simplest example I could muster using 4.2.10 where mob is a rodsuser and an administrator has registered and given ownership to them:

$ ils -AL ../public/goo
  rods              0 demoResc           15 2021-10-06.20:54 & goo
        generic    /lol/wot
        ACL - mob#otherZone:own
$ iunreg ../public/goo # as rodsuser mob
$ ils -AL ../public/goo
remote addresses: 127.0.0.1 ERROR: lsUtil: srcPath /otherZone/home/public/goo does not exist or user lacks access permission
$ ls -l /lol/wot
-rw-r--r-- 1 root root 15 Oct  6 20:52 /lol/wot

So, to your point, we should write some more words in the documentation and help text which explains when and how to use iunreg appropriately. For some recent history about ireg, iunreg, and itrim, see these issues:

https://github.com/irods/irods/issues/4506 (this demonstrates the need for iunreg as well as the justification for rodusers not being able to unregister data in a Vault)
https://github.com/irods/irods/issues/4510 (this demonstrates how the registration permissions for normal users has been wrong for a very long time if the documentation represents how things ought to be)

Hope that helps!

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org

 iROD-Chat:  http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/4c845874-5870-e545-88fa-52f6b6038382%40cc.in2p3.fr.


--
Alan King
Software Developer | iRODS Consortium

Jean-Yves Nief

unread,
Oct 7, 2021, 10:54:35 AM10/7/21
to irod...@googlegroups.com, Alan King
hello Alan,

            thanks for your detailed answers, please find my answers below:

Alan King wrote:
> Hi,
>
> Let me see if I can address everything satisfactorily through the
> different points...
>
> Due to a configuration bug, the replication to the "tape" resource
> failed.
> In such cases, the usual behavior I used to notice was that no
> replica
> on "tape" was registered in iRODS.
>
>
> I think that iRODS leaving a stale replica in the catalog in this case
> is a reasonable expectation depending on the nature of the failure. If
> something went wrong while creating the catalog entry or opening the
> physical file on disk (if applicable), the replica will be unlinked
> and unregistered from the catalog (again, if applicable). If the
> creation of the replica in the catalog is successful but there is a
> failure in the data movement or finalizing the replica in the catalog,
> I would expect it to remain in the catalog with a stale state (as you
> observed). Plus, removing the entry from the catalog on failure can be
> configured as policy for this resource if that is the desired
> behavior. This is the justification for this change, but we are
> willing to hear use cases and reconsider as appropriate!
in that case, the physical creation of the replica was not possible,
hence the "syncToArch" function of the univmss resource was returning an
error as well as the last call to the "stat" function which is also
returning an error as it failed. It fits with the first case you are
describing, hence the replica should be unlinked and unregistered from
the catalog.
It can be reproduced easily with a cache and an archive resource of type
"unixfilesystem". In the syncToArch function of the
"univMSSInterface.sh" script (or whatever you choose), you can use a
fake "cp" command which will fail.
>
> Even if I do a "iput -f foo", which is a
> crude way of trying to recover the situation, the stale copy on
> "archive" remains.
>
>
> It is surprising to me that the  replica in tapeis not updating. Is
> the sync to archive operation failing because of the configuration
> error you mentioned, or is the replication supposed to be working?
my bad! When I was reproducing the error and testing that, I did not fix
the source of the sync problem, before doing an "iput -f" to clear the
situation, hence observing the above.
Now, it works as expected, clearing the situation smoothly.
>
>  > iunreg -S "Cmp;tape" foo
> remote addresses: xxx ERROR: trimUtil: trim error for /.../foo.
> status =
> -78000 SYS_RESC_DOES_NOT_EXIST
> does not work. Is my syntax wrong ? Or is it not possible to
> unregister
> files within a child resource ?
>
> ...
>
> A "iunreg -n1 foo" works, eventhough it is more convenient to work
> with
> the "-S" option.
>
>
> When -S or -n is used with iunreg, it goes into an itrim-like mode. In
> fact, as you have pointed out, itrim and iunreg are using the API
> endpoint to accomplish their goals with similar options. At this point
> in time, -S is only allowed to point to a root resource. We have had
> discussions about itrim (and by extension iunreg) being more surgical.
> We would like to make itrim/iunreg -S /require/ a leaf resource or a
> full hierarchy (as you tried to do). Unfortunately, for now, it
> remains a root (or standalone) resource and the replica number is the
> only way to target a specific replica. Your use case is another
> argument for the more surgical approach in the future, unless you
> object :)
You bet! It is more straightforward using "-S" for a leaf resource or
full hierarchy. For compound resource, it is almost straightforward to
use the "-n" option as you almost know for sure the replica number you
want to clean. But otherwise, one has to do an iquest first to pick up
the right replication number corresponding to the resource you want to
trim for a given file. It is a 2 step operation. It makes life much
easier when trimming an entire tree recursively.
>
> I have noticed also that "iunreg" does not seem to work for
> "rodsuser",
> only for "rodsadmin". If that is really the case, I am fine with that
> but it should be mention in the help text.
>
>
> This is the trickiest point. iunreg is meant to act as a counter-part
> to ireg. It therefore has the same policy configuration as ireg as far
> as modifying data in vaults - acNoChkFilePathPerm. Further, a normal
> user /can /use iunreg to unregister data which does not reside in a
> Vault. Here is the old help text from irm -U (which is now deprecated
> in favor of iunreg):
>
> The -U option allows the unregistering of the data object or
> collection
> without deleting the physical file. Normally, a normal user cannot
> unregister a data object if the physical file is located in a resource
> vault. The acNoChkFilePathPerm rule allows this check to be bypassed.
>
>
> Unwritten here is that a normal user /can /unregister data objects
> whose replicas do not reside in a resource Vault. Here is the simplest
> example I could muster using 4.2.10 where mob is a rodsuser and an
> administrator has registered and given ownership to them:
>
> $ ils -AL ../public/goo
>   rods              0 demoResc           15 2021-10-06.20:54 & goo
>         generic    /lol/wot
>         ACL - mob#otherZone:own
> $ iunreg ../public/goo # as rodsuser mob
> $ ils -AL ../public/goo
> remote addresses: 127.0.0.1 ERROR: lsUtil: srcPath
> /otherZone/home/public/goo does not exist or user lacks access permission
> $ ls -l /lol/wot
> -rw-r--r-- 1 root root 15 Oct  6 20:52 /lol/wot
good, so it works like the old 'irm -U' and it makes sense that non
admin users can't unregister files.
thanks,
JY
> <mailto:irod-chat%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/irod-chat/4c845874-5870-e545-88fa-52f6b6038382%40cc.in2p3.fr.
>
>
>
> --
> Alan King
> Software Developer | iRODS Consortium
> --
> --
> The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
>
> iROD-Chat: http://groups.google.com/group/iROD-Chat
> ---
> You received this message because you are subscribed to the Google
> Groups "iRODS-Chat" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to irod-chat+...@googlegroups.com
> <mailto:irod-chat+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/irod-chat/CADnp3x6%2BXnMGAC5wwLX-0K4Vm7NsfYJ5Z9hNnKS0skBeB6gWng%40mail.gmail.com
> <https://groups.google.com/d/msgid/irod-chat/CADnp3x6%2BXnMGAC5wwLX-0K4Vm7NsfYJ5Z9hNnKS0skBeB6gWng%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages