iphymv from complex hierarcy gives SYS_LIBRARY_ERROR

31 views
Skip to first unread message

J.P. Mc Farland

unread,
Apr 27, 2024, 7:47:47 AMApr 27
to iRODS-Chat
Hi,

This is a fun one!

While migrating data with version 4.2.11 from a complex to a simple resource hierarchy, iphymv for some data objects results in a SYS_LIBRARY_ERROR.  In the rodsLog on the server, it shows a bit more detail:

ERROR: duplicate key value violates unique constraint "idx_data_main2" Key (coll_id, data_name, data_repl_num, data_version)=(...) already exists.;

Looking at the "ils -L" output for these, all data objects have two of the replicas in the same resource hierarchy.  The data_paths of these fall into three categories: the both have the exact same data_path, they have the same data_path with the exception of an added numeric extension, or they have the the same data_path with an added "replica" component before the "home" component.  It looks like some combination of these:

/path/to/unixfilesystem/home/collection/file.ext
/path/to/unixfilesystem/home/collection/file.ext.0123456789
/path/to/unixfilesystem/replica/home/collection/file.ext
/path/to/unixfilesystem/replica/home/collection/file.ext.0123456789

The "fix" to this seems to be to "iunreg" one of the two if they are identical, or if not,  "itrim" one once they are confirmed to be the same on disk.  I have not tried either of these approaches yet, but I suspect they might also give trouble.  If that is the case, I can always manually delete a row from the ICAT and delete the corresponding file on disk as a last resort.

For a few of them, this approach is feasible.  As the number of these instances increases, however, there comes a point where it is infeasible to to do it manually.  I am far, far beyond that point, unfortunately.

Might there be some "magic bullet" for these, or is my only option to brute force it?

Cheers,

--John

Alan King

unread,
Apr 29, 2024, 9:57:33 AMApr 29
to irod...@googlegroups.com
Hi John,

You have a strange sense of fun ;)

I consider it a bug that iphymv or any other operation can create more than one replica on a given storage resource, so we should file an issue for that. That said, there is no built-in tool for dealing with this because it's not supposed to happen.

Are the replicas differentiable by replica number? You should be able to, like you said, use itrim or iunreg if you can target the replicas in question by a unique replica number. Otherwise, I think your last resort approach may be the only way to deal with it.

If you are able to, can you share more details about what you did to create the situation and whether it is reliably reproducible? If so, then we can look to get a fix in a future release and prevent this from happening.

Thanks!

Alan

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/4101b651-e160-49c8-990b-6b23640fbbaan%40googlegroups.com.


--
Alan King
Senior Software Developer | iRODS Consortium

Terrell Russell

unread,
Apr 29, 2024, 10:05:58 AMApr 29
to irod...@googlegroups.com
If you can fix it manually (with icommands) without touching the database, you should be able to query for the violating replicas, and push them into trim... handsfree...

Consider building a list of itrim commands with the results of an iquest...

iquest "itrim -n%s -N1 '%s/%s'" "select DATA_REPL_NUM, COLL_NAME, DATA_NAME where ..."

And then pushing the resulting shell commands into gnu parallel for speed...

Terrell



J.P. Mc Farland

unread,
Apr 30, 2024, 7:05:21 AMApr 30
to iRODS-Chat
Hi Alan,

Indeed, I run into some of the most unusual happenings you can see in iRODS.  I've seen weird embedded white-space (ever "see" a CTRL-C in a filename?) and other troublesome character combinations, all likely due to our users making great use of WebDav connections.  This current problem may stem from that, but there is no real way to tell.  It could also be due to "iput -f" as well.  These double replicas on a leaf resource each have a unique replica number, but do not always have unique paths.  I believe the source of this is somehow partially failed updates, whether WebDav or iput.  I can diff some text replicas when they are not the same size/modify_ts and see clear version differences.  If I find a trend or a highly affected user, I will be sure to investigate to hopefully get some insight into the cause.  But that is low priority as I need to get this data migrated with the least delay.  And, migrated data will no longer be affected as it will not be replicated.

Hi Terrell,

Thanks!  I have sometimes had iCommands error or do the wrong thing in arguably unusual circumstances, but I am using a similar approach to manually solve the apostrophe-and issue I posted about earlier.  Fortunately there are only a few thousand of those.  Unfortunately, this problem affects orders of magnitude more data objects, so I need to keep manual intervention to a minimum, particularly for my sanity.

One very unusual circumstance that I am still trying to replicate is itrim trimming of the wrong replica.  I have only seen this on production data so far and it might even be related to this issue.  For instance, while trying to trim replica 2 of replica numbers 0, 1, 2, replica 0 would be trimmed instead unless I first modified DATA_REPL_STATUS of replica 0 to something other than 1 to "lock" it, trim 2, set 0 back.  This is obviously incorrect behavior, but I have not been able to reproduce it yet.  If/when I do, I will surely post it here for the connoisseurs.

Cheers,


 --John

Alan King

unread,
Jun 10, 2024, 11:52:29 AMJun 10
to irod...@googlegroups.com
Thanks for the details. I've created an issue here to investigate the replicas with the same physical path: https://github.com/irods/irods/issues/7775

We've received this report from other users as well. Based on the reports, this may be related to WebDAV (Davrods?), but the server should handle this more gracefully.

J.P. Mc Farland

unread,
Jun 10, 2024, 1:43:44 PMJun 10
to iRODS-Chat
Hi Alan,

Indeed.  I believe Davrods does have something to do with it.  The occurrences are restricted to a specific window of time and may have to do with updating existing data objects from what I can tell.  At times, Davrods caused other issues such as DATA_SIZE == 0 and too few replicas when using a replication resource, both quite unusual.

I will have a look at the issue and see what I can add.

Cheers,


--John
Reply all
Reply to author
Forward
0 new messages