File name collision

25 views
Skip to first unread message

Bruno Santos

unread,
Apr 1, 2026, 11:14:30 AM (11 days ago) Apr 1
to iRODS-Chat
Hi,

We found a strange issue that we can only describe as a file collision issue.

Different files (they are the same name but are in different collections) when replicated to a s3 resource have the same file path in irods.

As background info, it is relevant to know that our irods server has random scheme enabled and the version is 4.3.4.
Our replication process is custom made (due to issues with the replicate resource type).

Short summary of our process to move files to tapes (names are tape_1 and tape_2):
1. File is placed in local storage (name hot_1) and it has a checksum.
Replicate to tapes:
2. Replication to tape_1 without checksum calculation
3. Replication to tape_2 without checksum calculation
Integrity check:
4. Calculate checksum on tape_1.
5. Compare the checksum with the value from hot_1.
6. Calculate checksum on tape_2.
7. Compare the checksum with the value from hot_1.
Trim:
8. If all checksum match, the hot_1 replica is trimmed.

We do the upload and the checksum in separate steps because as it is a tape system, we only can read from his cache and the file will be delete from cache in a few minutes (I think it is 5 minutes). That time is not enough to calculate checksums if it is a big file. Joris created a discussion about this in the group in the past.


The code used for replication is:
```
ReplicateToTape{
  writeLine("serverLog", "msiDataObjReplWrapper: Replicate *path from *resource_source to *resource_destination");
  msiDataObjRepl(*path,'destRescName=*resource_destination++++rescName=*resource_source++++irodsAdmin=++++verifyChksum=0', *out_param)
  writeLine("serverLog", "msiDataObjReplWrapper: Replicate *path from *resource_source to *resource_destination done: *out_param");
}

INPUT *resource_destination="tape", *path="/a/data/obj/path", *resource_source="hot_1"
OUTPUT ruleExecOut
```

The rule execution outputs the logs (with the server timestamp):
2026-04-01T07:42:26.030Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/ydvJK1/data_object_name.txt from hot_1 to tape_2
2026-04-01T07:42:26.357Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/ydvJK1/data_object_name.txt from hot_1 to tape_2 done: 0

2026-04-01T07:42:26.030Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/cF5g5T/data_object_name.txt from hot_1 to tape_2
2026-04-01T07:42:26.365Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/cF5g5T/data_object_name.txt from hot_1 to tape_2 done: 0


The tape resource details:

$ ilsresc tape_2
tape_2:passthru
└── s3-tape-02:s3

$ ilsresc -l  s3-tape-02
(...)
context: S3_DEFAULT_HOSTNAME=s3.object-archive.nl;S3_AUTH_FILE=/var/data/s3-tape-02.s3.keypair;S3_REGIONNAME=nlprd-02;S3_RETRY_COUNT=2;S3_WAIT_TIME_SECONDS=3;S3_PROTO=HTTPS;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached;S3_CACHE_DIR=/s3/cache;S3_MPU_CHUNK=125
(...)


And the ils for the data objects:

$ ils -L /ZONE/COLLECTIONS/DATA/ydvJK1/data_object_name.txt
  irods             1 tape_1;s3-tape-01           86 2026-04-01.09:42 & data_object_name.txt
    sha2:YlBCA+WhdUmm81mfGmI0GGCatFaQyWLKm6mOUJa9AB0=    generic    /s3-tape-01/irods/5/0/data_object_name.txt.1775029345
  irods             2 tape_2;s3-tape-02           86 2026-04-01.09:42 & data_object_name.txt
    sha2:YlBCA+WhdUmm81mfGmI0GGCatFaQyWLKm6mOUJa9AB0=    generic    /s3-tape-02/irods/1/13/data_object_name.txt.1775029346
$

$ ils -L /ZONE/COLLECTIONS/DATA/cF5g5T/data_object_name.txt
  irods             0 hot_1;local_filestore             86 2026-04-01.09:24 & data_object_name.txt
    sha2:ls3PiRia8R9k75L3ZJWFMrnWXVv4o8S2IyAvBR9RJ+4=    generic    /irods/data/a278a33b-9081-40a3-bfc6-03e412f952e9/data_object_name.txt
  irods             1 tape_1;s3-tape-01           86 2026-04-01.09:42 & data_object_name.txt
    sha2:ls3PiRia8R9k75L3ZJWFMrnWXVv4o8S2IyAvBR9RJ+4=    generic    /s3-tape-01/irods/12/12/data_object_name.txt.1775029345
  irods             2 tape_2;s3-tape-02           86 2026-04-01.09:42 & data_object_name.txt
    sha2:YlBCA+WhdUmm81mfGmI0GGCatFaQyWLKm6mOUJa9AB0=    generic    /s3-tape-02/irods/1/13/data_object_name.txt.1775029346
$
(The server timestamp has 2 hours diff)

Our checksum validation failed because in tape_2 those 2 files have the same content (are the same).

This was the first time we found this and the process executed thousands of times before.

Any ideas?


Thanks in advance,
Bruno


joris luijsterburg

unread,
Apr 3, 2026, 3:31:14 AM (9 days ago) Apr 3
to iRODS-Chat
We had another occasion just now. Maybe a bit more statistical background, we have a process generating  data_object_name.txt files. Since march 20 we had 240 of these files, with 5 collisions on exact timestamp (the chances for these collisions on timestamp become bigger because tasks are periodically put on the delayqueue, where they are picked up at the same time). Out of these 5 collisions two also had a collision on the directory structure in the randomscheme, and they are all uploaded by the same user. I did the math, but if I look at all files uploaded they seem to be fairly distributed across all 256 possible randomscheme directories, so nothing is off there.

We previously did collisiontesting with randomscheme and found that if there is a collision, an extra number is added:  e.g. data_object_name.txt.1775029346 becomes  data_object_name.txt.1775029346.1775029999. Somehow iRODS did not catch the collision this time though. Could that be related to the s3_plugin or the storage backend?


best regards,

Joris

James, Justin Kyle

unread,
Apr 7, 2026, 2:31:53 PM (4 days ago) Apr 7
to iRODS-Chat
This seems to me to be an issue with the random vault path use and not something specific to S3.

Is the move penalty the reason you have chosen the random vault path scheme?

One possibility is to use ARCHIVE_NAMING_POLICY=decoupled in the S3 configuration and not use the random vault path scheme.  That would give you a vault path that is not tied to the logical path so any movement would just be a database update for the logical path.

That said, to use that you would want to be on iRODS 4.3.5 due to this issue - https://github.com/irods/irods_resource_plugin_s3/issues/2146.

As far as the collision when using the random vault path scheme, I'm not sure so someone else may have to weigh in.



From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of joris luijsterburg <jlu...@gmail.com>
Sent: Friday, April 3, 2026 3:31 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: [iROD-Chat:22512] Re: File name collision
 
--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/651eda66-be73-4f1d-9f93-a8158155338dn%40googlegroups.com.

James, Justin Kyle

unread,
Apr 7, 2026, 3:40:39 PM (4 days ago) Apr 7
to iRODS-Chat
We've determined that the chance of a collision is too high and I wrote the following issue on it.



From: James, Justin Kyle <jja...@renci.org>
Sent: Tuesday, April 7, 2026 2:31 PM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:22512] Re: File name collision
 

joris luijsterburg

unread,
Apr 8, 2026, 3:53:58 AM (4 days ago) Apr 8
to iRODS-Chat
Thanks for the answer!

We have noticed that when people have data in tape they still want to move the data around to reorganise their folder structure. We needed randomscheme to facilitate that. Also removing data without the -f flag is not possible without random scheme. I have to dig a bit in memory as to why we chose randomscheme over decoupled, but we could experiment with decoupled to see if that would work better for us.

best regards,

Joris

joris luijsterburg

unread,
Apr 8, 2026, 4:18:36 AM (4 days ago) Apr 8
to iRODS-Chat
I am still curious about the second part of my question though. I looked back and we still had the files we used for testing randomscheme lying around in our test environment. We sent a single filename to irods with a very high number of files/second, to force collisions to happen. When that did happen we found that it looks like iRODS has a countermeasure for collisions. The files  /zonename/home/myuser/aa/aa/a192/b.txt and  /zonename/home/myuser/aa/aa/a266/b.txt were both uploaded in the same second, and both got assigned the 11/15 directory by randomscheme. However, somehow iRODS found this, and put one of the files in a different physical place(replica directory and an extra number behind the timestamp). We assumed this was designed becasue of collisions, but were we correct there?

This example was not on an s3 resource, but a unixfilesystem resource.

COLL_NAME = /zonename/home/myuser/aa/aa/a192
DATA_NAME = b.txt
DATA_PATH = /nfspath/irods/vault/replica/myuser/11/15/b.txt.1734531773.2535652680
------------------------------------------------------------
COLL_NAME = /zonename/home/myuser/aa/aa/a266
DATA_NAME = b.txt
DATA_PATH = /nfspath/irods/vault/myuser/11/15/b.txt.1734531773
------------------------------------------------------------

James, Justin Kyle

unread,
Apr 8, 2026, 10:34:09 AM (4 days ago) Apr 8
to iRODS-Chat
That may be true.  I'll look more into it to see.

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of joris luijsterburg <jlu...@gmail.com>
Sent: Wednesday, April 8, 2026 4:18 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:22517] Re: File name collision
 

James, Justin Kyle

unread,
Apr 8, 2026, 12:09:28 PM (4 days ago) Apr 8
to iRODS-Chat
Can you do an ils -L on those two objects so I can see whether the data object is marked stale or not.

We do append the "replica" and another integer to the original object filepath but only if it is not stale.  If the object is stale I believe we do not update it.

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of joris luijsterburg <jlu...@gmail.com>
Sent: Wednesday, April 8, 2026 4:18 AM
To: iRODS-Chat <irod...@googlegroups.com>

Subject: Re: [iROD-Chat:22517] Re: File name collision

joris luijsterburg

unread,
Apr 9, 2026, 3:16:18 AM (3 days ago) Apr 9
to iRODS-Chat
The original files that we spoke about in the first message have status 1 in all replica's. Of course the status could have been 2 or 4 at the time the other file was being written to, as we were replicating. 
That is another difference between our testcase and the production incident, in test we did an iput, in production a replication using msiDataObjRepl.
Reply all
Reply to author
Forward
0 new messages