automated ingest authentication

30 views
Skip to first unread message

Mark Schenk

unread,
Sep 1, 2025, 5:26:29 AM (6 days ago) Sep 1
to iRODS-Chat
Hello all,

I am trying to set up the automated ingest framework from the readme at https://github.com/irods/irods_capability_automated_ingest.

I am getting really stuck at authentication. 

My server only allows PAM authentication. My client machine does not have the irods command line tools available, and that is a requirement (the setup is meant to be reproducible at researcher's machines, the only requirement allowed is having Python). 

How do I set up a workflow where a researcher simply takes a working irods_environment.json plus a PAM password and builds a working setup for the automated ingest tool from there? Any pointers would be appreciated!

Kind regards,
Mark

Alan King

unread,
Sep 2, 2025, 9:03:42 AM (5 days ago) Sep 2
to irod...@googlegroups.com
Hi Mark,

First, I wanted to mention that if the plan is to have individual users managing the ingest service, you might want to consider using a simpler solution such as ManGO Ingest: https://github.com/kuleuven/mango-ingest It is designed as a command line tool for use by individuals for both one-off and scheduled syncs to iRODS. I believe that project can be used directly with iRODS and is not specific to the ManGO portal.

Secondly, in light of my first point, I wanted to recommend considering setting up a job submission service for users. This might be a heavier lift (and a longer conversation), but could be built in a way that avoids the need for using PAM with iRODS directly by the users. I'll just leave that at that for now. :)

Now to answer the question about using PAM...

Both ManGO Ingest and iRODS Automated Ingest Capability are built on the Python iRODS Client (PRC), so I'd recommending making sure that you can authenticate with iRODS using PAM in a basic PRC application before moving on to other tools. Unfortunately, iinit is the de facto standard for getting client environments bootstrapped at the moment, and PRC does not currently have an out-of-the-box script equivalent to iinit (that effort is being tracked here: https://github.com/irods/python-irodsclient/issues/689). However, PRC does provide some tools for helping with this process.

Generating an .irodsA file is possible using a couple of free functions as described here: https://github.com/irods/python-irodsclient?tab=readme-ov-file#creating-a-pam-or-native-authentication-file This does require writing your own script, but you will probably want to make something fitted for your prospective users anyway. Please note that iRODS sessions authenticated using PAM expire and will require re-authenticating occasionally. See https://docs.irods.org/5.0.1/system_overview/troubleshooting/#users-are-forced-to-re-authenticate-after-a-few-minutes for how to configure iRODS to your needs for that case.

The client environment file - irods_environment.json - is also normally created by iinit, but users can construct one themselves relatively easily. The minimal client environment is a JSON file that only requires a few things and will look like this for a PAM user:
{
    "irods_authentication_scheme": "pam_password",
    "irods_host": "irods-server.hostname",
    "irods_port": 1247,
    "irods_user_name": "alice",
    "irods_zone": "tempZone"
}

So, a basic series of prompts from a script to generate an irods_environment.json file and using the irods.client_init.write_pam_irodsA_file function for authenticating with the PAM password should be enough to bootstrap users using just PRC.

Hope that helps. Please reach out with questions - I know that's a lot of words.

Alan

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/a35b7c59-54c2-409c-9732-547ff50cf962n%40googlegroups.com.


--
Alan King
Senior Software Developer | iRODS Consortium

Mark Schenk

unread,
Sep 2, 2025, 9:59:02 AM (5 days ago) Sep 2
to iRODS-Chat
Hi Alan,

thank you for you elaborate answer! It is much appreciated.

A little bit of background on what I am trying to accomplish: I'd like to have a guide or recipe for users to set up an automated ingest from a laboratory or measurement setup. So these are not users who will be pushing data manually, but they will set up a long-running process that automatically picks up measurements as they come in, probably adds some metadata to it, and then moves it into iRods. I expect these users to be able to write the scripting for metadating and maybe renaming. My idea was that iRods Automated ingest was a good basis for this, but if that is not the case then I am happy to explore alternatives (such as mango-ingest).

I do have some experience using the PRC, and mostly authentication looks more or less like this:

irods_host = os.getenv('IRODS_HOST')
irods_user = os.getenv('IRODS_USER')
irods_password = get_from_vault('irods_password')
irods_zone = os.getenv('IRODS_ZONE')

ssl_settings = {'client_server_negotiation': 'request_server_negotiation',
                'client_server_policy': 'CS_NEG_REQUIRE',                
                'encryption_algorithm': 'AES-256-CBC',
                'encryption_key_size': 32,
                'encryption_num_hash_rounds': 16,
                'encryption_salt_size': 8,
                'ssl_verify_server': 'hostname',
}

with iRODSSession(
    host=irods_host,
    port=1247,
    user=irods_user,
    password=irods_password,
    authentication_scheme="pam_password",
    zone=irods_zone,
    **ssl_settings) as session:
        c = session.collections.get(f"/{irods_zone}/home")
        print(c)

Everything then seems to work virtually the same as without pam (except for the authentication_scheme) and there are no external dependencies. I guess I was expecting the ingest tool to be able to work in a similar way!

I am interested in your remark about a job submission service though (if that is still relevant in light of my explanation here).

Best regards,
Mark

Op dinsdag 2 september 2025 om 15:03:42 UTC+2 schreef alankin...@gmail.com:

Kory Draughn

unread,
Sep 2, 2025, 12:26:08 PM (5 days ago) Sep 2
to irod...@googlegroups.com
Hi Mark,

I think there's a number of things that need to be considered first. For example (not exhaustive):
  • Am I understanding this to mean the guide would be aimed at lab managers?
  • Are all users allowed to submit their own jobs? Who is in charge of the ingestion process?
  • How are permissions handled? Are they propagated to iRODS?
With that said, I recommend writing a tiny PRC script which solves the issue for one user. Once you have that working, you can think about improvements and scaling. If that's not sufficient, only then would you want to look into larger systems.

Hope that helps.

Kory Draughn
Chief Technologist
iRODS Consortium


Kory Draughn

unread,
Sep 2, 2025, 12:33:11 PM (5 days ago) Sep 2
to irod...@googlegroups.com
Sorry - that first bullet was supposed to be "Is the guide aimed at lab managers?"

Kory Draughn
Chief Technologist
iRODS Consortium

John Constable

unread,
Sep 3, 2025, 7:11:42 AM (4 days ago) Sep 3
to iRODS-Chat
Hi Mark

You might have different solutions available depending on the infrastructure and requirements you have.

Do you have an HPC cluster?
Is the data being sent directly from the machines, or do the machines write to a file storage system somewhere?
Does the solution need to be automated, or are you looking for a way for the lab managers to be able to upload a directory tree on request? Are all file in the tree needing to be uploaded, or do you need to omit some?

I have worked with a number of different groups doing this sort of thing, and some of the ways I have seen them try and address the issue are;

1. the instrument writes to a central file share. The file share is regularly polled by Automated Ingest and any files under a certain area are uploaded. There are a number of post upload issues with this around ownership, metadata tagging and so forth. Nothing unsolveable, but gets complex quick!
2. The User has a tool like ManGO or CyberDuck and manually uploads the files when the run is complete. This solves the ownership issue, but is very manual and doesn't help with tagging.
3. an automation script runs on the instrument polling for files. When it files the file(s) that designate a complete run, it uploads everything as a specific user, with as much metadata as it can determine (see Alan's comments on NetCDF and the like)/is provided with. This worked well, but its quite a lot of work as it has to custom written and then managed and maintained.
4.as above, but the script is run by the lab manager on completion of a run. This avoids partial run uploads, but runs the risk of the lab manager forgetting/needing documentation and training.

If you're based in Europe, I have some availability Thursday morning if you wanted to talk through these?

cheers

John

--
Want to stay abreast of developments in iRODS but can’t read every bug report?
Sign up to https://theresource.metadata.school/ for a monthly update on the iRODS community.

Mark Schenk

unread,
Sep 3, 2025, 9:01:59 AM (4 days ago) Sep 3
to iRODS-Chat
Hi Kory and John,

What I am trying to achieve is mostly similar to the 3rd option in John's post above: have an automated script that determines when it's time to upload a batch of files, providing as much metadata as can be determined. I'd like the responsibility of that script to be the lab (data) manager for as much as possible.

I am perfectly fine with writing my own script to do this, but I want to avoid re-inventing wheels or making rookie mistakes, which is why I am looking into existing tools. Your comments are very valuable for me to find my way, so thank you for that!

Op woensdag 3 september 2025 om 13:11:42 UTC+2 schreef jo...@metadata.school:

Kory Draughn

unread,
Sep 5, 2025, 9:05:45 AM (2 days ago) Sep 5
to irod...@googlegroups.com
Mark,

Your goal and option 3 from John's response align with the automated ingest tool.

As Alan mentioned, the first task is to prove that you can authenticate using PAM with the PRC (no ingest tool yet). The README of the ingest tool points to the Client Settings File for PAM authentication.

Thanks,

Kory Draughn
Chief Technologist
iRODS Consortium

Reply all
Reply to author
Forward
0 new messages