File name collision

5 views
Skip to first unread message

Bruno Santos

unread,
Apr 1, 2026, 11:14:30 AM (4 days ago) Apr 1
to iRODS-Chat
Hi,

We found a strange issue that we can only describe as a file collision issue.

Different files (they are the same name but are in different collections) when replicated to a s3 resource have the same file path in irods.

As background info, it is relevant to know that our irods server has random scheme enabled and the version is 4.3.4.
Our replication process is custom made (due to issues with the replicate resource type).

Short summary of our process to move files to tapes (names are tape_1 and tape_2):
1. File is placed in local storage (name hot_1) and it has a checksum.
Replicate to tapes:
2. Replication to tape_1 without checksum calculation
3. Replication to tape_2 without checksum calculation
Integrity check:
4. Calculate checksum on tape_1.
5. Compare the checksum with the value from hot_1.
6. Calculate checksum on tape_2.
7. Compare the checksum with the value from hot_1.
Trim:
8. If all checksum match, the hot_1 replica is trimmed.

We do the upload and the checksum in separate steps because as it is a tape system, we only can read from his cache and the file will be delete from cache in a few minutes (I think it is 5 minutes). That time is not enough to calculate checksums if it is a big file. Joris created a discussion about this in the group in the past.


The code used for replication is:
```
ReplicateToTape{
  writeLine("serverLog", "msiDataObjReplWrapper: Replicate *path from *resource_source to *resource_destination");
  msiDataObjRepl(*path,'destRescName=*resource_destination++++rescName=*resource_source++++irodsAdmin=++++verifyChksum=0', *out_param)
  writeLine("serverLog", "msiDataObjReplWrapper: Replicate *path from *resource_source to *resource_destination done: *out_param");
}

INPUT *resource_destination="tape", *path="/a/data/obj/path", *resource_source="hot_1"
OUTPUT ruleExecOut
```

The rule execution outputs the logs (with the server timestamp):
2026-04-01T07:42:26.030Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/ydvJK1/data_object_name.txt from hot_1 to tape_2
2026-04-01T07:42:26.357Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/ydvJK1/data_object_name.txt from hot_1 to tape_2 done: 0

2026-04-01T07:42:26.030Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/cF5g5T/data_object_name.txt from hot_1 to tape_2
2026-04-01T07:42:26.365Z msiDataObjReplWrapper: Replicate /ZONE/COLLECTIONS/DATA/cF5g5T/data_object_name.txt from hot_1 to tape_2 done: 0


The tape resource details:

$ ilsresc tape_2
tape_2:passthru
└── s3-tape-02:s3

$ ilsresc -l  s3-tape-02
(...)
context: S3_DEFAULT_HOSTNAME=s3.object-archive.nl;S3_AUTH_FILE=/var/data/s3-tape-02.s3.keypair;S3_REGIONNAME=nlprd-02;S3_RETRY_COUNT=2;S3_WAIT_TIME_SECONDS=3;S3_PROTO=HTTPS;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached;S3_CACHE_DIR=/s3/cache;S3_MPU_CHUNK=125
(...)


And the ils for the data objects:

$ ils -L /ZONE/COLLECTIONS/DATA/ydvJK1/data_object_name.txt
  irods             1 tape_1;s3-tape-01           86 2026-04-01.09:42 & data_object_name.txt
    sha2:YlBCA+WhdUmm81mfGmI0GGCatFaQyWLKm6mOUJa9AB0=    generic    /s3-tape-01/irods/5/0/data_object_name.txt.1775029345
  irods             2 tape_2;s3-tape-02           86 2026-04-01.09:42 & data_object_name.txt
    sha2:YlBCA+WhdUmm81mfGmI0GGCatFaQyWLKm6mOUJa9AB0=    generic    /s3-tape-02/irods/1/13/data_object_name.txt.1775029346
$

$ ils -L /ZONE/COLLECTIONS/DATA/cF5g5T/data_object_name.txt
  irods             0 hot_1;local_filestore             86 2026-04-01.09:24 & data_object_name.txt
    sha2:ls3PiRia8R9k75L3ZJWFMrnWXVv4o8S2IyAvBR9RJ+4=    generic    /irods/data/a278a33b-9081-40a3-bfc6-03e412f952e9/data_object_name.txt
  irods             1 tape_1;s3-tape-01           86 2026-04-01.09:42 & data_object_name.txt
    sha2:ls3PiRia8R9k75L3ZJWFMrnWXVv4o8S2IyAvBR9RJ+4=    generic    /s3-tape-01/irods/12/12/data_object_name.txt.1775029345
  irods             2 tape_2;s3-tape-02           86 2026-04-01.09:42 & data_object_name.txt
    sha2:YlBCA+WhdUmm81mfGmI0GGCatFaQyWLKm6mOUJa9AB0=    generic    /s3-tape-02/irods/1/13/data_object_name.txt.1775029346
$
(The server timestamp has 2 hours diff)

Our checksum validation failed because in tape_2 those 2 files have the same content (are the same).

This was the first time we found this and the process executed thousands of times before.

Any ideas?


Thanks in advance,
Bruno


joris luijsterburg

unread,
Apr 3, 2026, 3:31:14 AM (2 days ago) Apr 3
to iRODS-Chat
We had another occasion just now. Maybe a bit more statistical background, we have a process generating  data_object_name.txt files. Since march 20 we had 240 of these files, with 5 collisions on exact timestamp (the chances for these collisions on timestamp become bigger because tasks are periodically put on the delayqueue, where they are picked up at the same time). Out of these 5 collisions two also had a collision on the directory structure in the randomscheme, and they are all uploaded by the same user. I did the math, but if I look at all files uploaded they seem to be fairly distributed across all 256 possible randomscheme directories, so nothing is off there.

We previously did collisiontesting with randomscheme and found that if there is a collision, an extra number is added:  e.g. data_object_name.txt.1775029346 becomes  data_object_name.txt.1775029346.1775029999. Somehow iRODS did not catch the collision this time though. Could that be related to the s3_plugin or the storage backend?


best regards,

Joris
Reply all
Reply to author
Forward
0 new messages