Hardlink plugin bug cause undesired removal of data

11 views
Skip to first unread message

Jan de Graaf

unread,
Apr 16, 2026, 6:44:27 AM (6 days ago) Apr 16
to iRODS-Chat
Hi,

We rely on the hardlink plugin of irods the generate links from our main facility data to individual project folders. This so we can shield of the original data and keep that nice and safe within the facilities and do not have to copy (and thereby duplicate) large amounts of data to the individual project folders where only project members have read access on the links.
This works pretty good. But we encountered an issue where data was removed from the project folder and also the orignal data on disk for the facilty was removed (on disk not the catologue entry!).

I took some time to replicate what exactly happens. But i've pinpointed the issue to the "IRM -R" command. Normal IRM on files doesn't expose an issue. The plugin works as espected. But with recursive folder removals a bug causes that the original data on disk is removed. But only on disk, the original catalogue entry are kept.
Replica's of the data on othert disk are also safe from this bug. 
Problem ofcourse is that you then think the file is there from an iRODS point of view, but actual data is removed from this. Luckely we replicate data to a R/O archive via the storage tiering so no data hase been realy lost. But without this data would have been lost.

These steps expose the problem

irods@p-irods-002:/tmp$ ils -L
/nki/home/massspec/project_951_to_1000/project_999_bleijerveld_jan:
  rods              0 res-05-store01;res-05-pt01;res-05-01            4 2026-04-16.11:26 & data.txt
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              1 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04a;res-05-04a            4 2026-04-16.11:26 & data.txt
        generic    /mnt_nfs_azure_archive/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              2 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04b;res-05-04b            4 2026-04-16.11:26 & data.txt
        generic    /mnt_nfs_azure_archive_n/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              0 res-05-store01;res-05-pt01;res-05-01           30 2026-04-16.12:17 & irods-ingestion.json
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
  rods              1 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04a;res-05-04a           30 2026-04-16.12:18 & irods-ingestion.json
        generic    /mnt_nfs_azure_archive/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
  rods              2 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04b;res-05-04b           30 2026-04-16.12:18 & irods-ingestion.json
        generic    /mnt_nfs_azure_archive_n/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
irods@p-irods-002:/tmp$ ls -altr /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/
total 128
drwxrwx--- 2 systemd-coredump irods 2048 Apr 16 10:11 ..
-rwxrwx--- 1 systemd-coredump irods    4 Apr 16 11:27 data.txt
drwxrwx--- 2 systemd-coredump irods 2048 Apr 16 12:18 .
-rwxrwx--- 1 systemd-coredump irods   30 Apr 16 12:18 irods-ingestion.json
irods@p-irods-002:/tmp$ ils -L /nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan
/nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan:
  rods              0 res-05-store01;res-05-pt01;res-05-01            4 2026-04-16.11:26 & data.txt
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              0 res-05-store01;res-05-pt01;res-05-01           30 2026-04-16.12:17 & irods-ingestion.json
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
irods@p-irods-002:/tmp$ imeta ls -d /nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan/data.txt | grep hardl
attribute: irods::hardlink
irods@p-irods-002:/tmp$ imeta ls -d /nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json | grep hardl
attribute: irods::hardlink
irods@p-irods-002:/tmp$
irods@p-irods-002:/tmp$ icd /nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan
irods@p-irods-002:/tmp$ irm -f data.txt
irods@p-irods-002:/tmp$ ils -l
/nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan:
  rods              0 res-05-store01;res-05-pt01;res-05-01           30 2026-04-16.12:17 & irods-ingestion.json
irods@p-irods-002:/tmp$ ils -L /nki/home/massspec/project_951_to_1000/project_999_bleijerveld_jan
/nki/home/massspec/project_951_to_1000/project_999_bleijerveld_jan:
  rods              0 res-05-store01;res-05-pt01;res-05-01            4 2026-04-16.11:26 & data.txt
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              1 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04a;res-05-04a            4 2026-04-16.11:26 & data.txt
        generic    /mnt_nfs_azure_archive/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              2 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04b;res-05-04b            4 2026-04-16.11:26 & data.txt
        generic    /mnt_nfs_azure_archive_n/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              0 res-05-store01;res-05-pt01;res-05-01           30 2026-04-16.12:17 & irods-ingestion.json
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
  rods              1 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04a;res-05-04a           30 2026-04-16.12:18 & irods-ingestion.json
        generic    /mnt_nfs_azure_archive/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
  rods              2 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04b;res-05-04b           30 2026-04-16.12:18 & irods-ingestion.json
        generic    /mnt_nfs_azure_archive_n/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
irods@p-irods-002:/tmp$ ls -altr /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/
total 128
drwxrwx--- 2 systemd-coredump irods 2048 Apr 16 10:11 ..
-rwxrwx--- 1 systemd-coredump irods    4 Apr 16 11:27 data.txt
drwxrwx--- 2 systemd-coredump irods 2048 Apr 16 12:18 .
-rwxrwx--- 1 systemd-coredump irods   30 Apr 16 12:18 irods-ingestion.json
irods@p-irods-002:/tmp$
irods@p-irods-002:/tmp$ icd ..
irods@p-irods-002:/tmp$ ils
/nki/home/projects/repr25-0162/project_951_to_1000:
  C- /nki/home/projects/repr25-0162/project_951_to_1000/project_999_bleijerveld_jan
irods@p-irods-002:/tmp$ irm -fr project_999_bleijerveld_jan
irods@p-irods-002:/tmp$ ils
/nki/home/projects/repr25-0162/project_951_to_1000:
irods@p-irods-002:/tmp$ ils -L /nki/home/massspec/project_951_to_1000/project_999_bleijerveld_jan
/nki/home/massspec/project_951_to_1000/project_999_bleijerveld_jan:
  rods              0 res-05-store01;res-05-pt01;res-05-01            4 2026-04-16.11:26 & data.txt
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              1 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04a;res-05-04a            4 2026-04-16.11:26 & data.txt
        generic    /mnt_nfs_azure_archive/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              2 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04b;res-05-04b            4 2026-04-16.11:26 & data.txt
        generic    /mnt_nfs_azure_archive_n/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/data.txt
  rods              0 res-05-store01;res-05-pt01;res-05-01           30 2026-04-16.12:17 & irods-ingestion.json
        generic    /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
  rods              1 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04a;res-05-04a           30 2026-04-16.12:18 & irods-ingestion.json
        generic    /mnt_nfs_azure_archive/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
  rods              2 res-05-store04;res-05-pt04;res-05-repl04;res-05-repl04-pt04b;res-05-04b           30 2026-04-16.12:18 & irods-ingestion.json
        generic    /mnt_nfs_azure_archive_n/res-05-04/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/irods-ingestion.json
irods@p-irods-002:/tmp$ ls -altr /irods/res-05-01/home/massspec/project_951_to_1000/project_999_bleijerveld_jan/
total 96
drwxrwx--- 2 systemd-coredump irods 2048 Apr 16 10:11 ..
-rwxrwx--- 1 systemd-coredump irods    4 Apr 16 11:27 data.txt
drwxrwx--- 2 systemd-coredump irods 2048 Apr 16  2026 .
irods@p-irods-002:/tmp$

See the last ILS and LS on disk. The file irods-ingestion.json is still in the ILS but in the actual LS on disk the file is gone!

We now know what happens so we urge not use the IRM -R command on project folders but we would like to have this issue fixed if possible.

Best,

Jan de Graaf
Netherlands Cancer Instute

Kory Draughn

unread,
Apr 16, 2026, 9:52:55 AM (6 days ago) Apr 16
to irod...@googlegroups.com
Hi Jan,

Happy to hear storage tiering and backups helped in preventing data loss.

We haven't released a new version of the hard links rule engine plugin since May 2020. I take it you're compiling the plugin from source?

There are no plans to fix and release a new version due to various implementation-related challenges. For example, it required full re-implementations of various APIs, which isn't ideal from a maintenance perspective. That's a clear sign that the plugin-based approach isn't ideal.

If we decide to add support for hard/soft links, making it a core server feature is how we'd approach it today.

Perhaps there's another way to achieve what you want without that plugin. We're happy to assist in brainstorming solutions.

Thanks,

Kory Draughn
Chief Technologist
iRODS Consortium


--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/4d15a81b-1931-4be1-881a-caa3505a9341n%40googlegroups.com.

Jan de Graaf

unread,
Apr 17, 2026, 7:54:45 AM (5 days ago) Apr 17
to iRODS-Chat

Hi Kory,

Since this is also an edge case, some context on the current situation: links are made read-only through access rights upon creation, and only admins have the rights to delete them. The rm -r command is something that should always be used with care. We do not expect to encounter this bug frequently. We now understand how and when it occurs, and we have proven fallback techniques (tiering/archiving) available.

That said, the Hard-Link plugin is an important part of our setup and architecture. The way we work with facilities and multi-omics projects results in large amounts of data needing to be made accessible. With the Hard-Link plugin, we can significantly reduce data volume by providing a read-only link to the original data within the projects that require it.

Using projects allows us to centralize all (multi-)omics data in one place for a given project, while keeping the facilities in charge of the original raw data. Without any form of linking, data duplication would be the only way forward — which is undesirable from a data management, cost, and lineage perspective. Therefore, some form of link (hard or soft) is necessary, hence our request to explore how we can make this work. Essentially, these are just entries in the database, though as you pointed out, other operations need to be aware of them.

I took the liberty of running this by an AI for an initial look: https://claude.ai/share/2ebeca4c-b491-453b-a4f6-9a535fee51e1 — could you tell us whether this would work as a starting point for a quick patch? If so, we can develop it further, patch the code, compile a new plugin on our end, and then look at working together on a more sustainable solution.

Best, Jan





Op donderdag 16 april 2026 om 15:52:55 UTC+2 schreef korydraughn.renci:
Reply all
Reply to author
Forward
0 new messages