Is it possible to have control over the irods resource physical data path?

26 views
Skip to first unread message

joris luijsterburg

unread,
Apr 11, 2025, 11:44:04 AMApr 11
to iRODS-Chat

Hey all,

 Background info: 
We have an s3 resource that has limitations on the chars we can use for the filenames (example: spaces are not allowed).
At the moment, we have a pep rule to block those chars on upload. It works, but we would like to remove this to have better user functionality.

 

The basic workflow could be:
1. The user does iput into a resource_WITHOUT_charset_limitations
2. There is a rule to replicate the file to the resource_WITH_charset_limitations

 

Irods has the concept or logical path and physical path. Is it possible to influence the physical path in a pep?
Something like the msiSetRandomScheme, but not only to collections, but also filenames. 
Of course, this also would mean that we need to handle collisions ourselves. And probably some other things we are not thinking about.

 

 Other possible solutions:
- Leave irods with the current rule that blocks chars and have an ingestion script to handle and replace chars there.
- Or allow all chars in resource_WITHOUT_charset_limitations and in the replication rule to the resource_WITH_charset_limitations, rename the logical filenames. Probably then also store the original names in metadata.


Best regards,


Joris

 

 


Terrell Russell

unread,
Apr 11, 2025, 12:08:22 PMApr 11
to irod...@googlegroups.com
Hi Joris,

Interesting use case...

We have thought through some of these issues with the automated ingest capability...

You are definitely correct about having to think about collisions.

Regarding your main question about a PEP being able to manipulate the physical path... it seems only the logical_path can be manipulated in the api pre PEP... and that would do what you want, but at the cost of also changing the logical path seen by everyone.

To change the physical path ONLY in the S3 plugin's replicas... you could use the context string parameter
`ARCHIVE_NAMING_POLICY=decoupled`.

This should only use 0-9 as the physical filenames in S3.   Please try that out and share with us if it is a solution.

Terrell







--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/079e6744-690d-4153-a6dc-0ddb10811c02n%40googlegroups.com.

joris luijsterburg

unread,
May 26, 2025, 7:52:10 AMMay 26
to iRODS-Chat
A small update:

You can indeed use ARCHIVE_NAMING_POLICY=decoupled` , or msiRandomscheme to say soemthing about the physical names. This tackles the collectionames, but not the filenames.

The thing is, while you cannot directly influence the physical filename, you can influence the logical filename! We now set msrandomscheme and when we move a file towards our s3 resource with character limitations we do an imv towards a filename with only allowed characters, keeping the original filename in metadata. After replication we do an imv again to the original filename. Now the file is on our resource with allowed characters in the physical path, and special characters in the logical path. 

The user can still use the file as is(unless they try to fetch it when the file is being replicated)

Example with emoji:

  irods             1 tape_1            0 2025-05-22.16:36 & file🙂
    sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=    generic    /tape_1/irods/14/12/file_.689119.1747924580
  irods             2 tape_2            0 2025-05-22.16:36 & file🙂
    sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=    generic    /tape_2/irods/6/0/file_.689119.1747924580

Reply all
Reply to author
Forward
0 new messages