[slurm-users] Creating /run/user/$UID - for Podman runtime

15 views
Skip to first unread message

John Snowdon via slurm-users

unread,
Sep 5, 2025, 3:58:22 AM (10 days ago) Sep 5
to slurm...@lists.schedmd.com
We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:

  • Apptainer
  • Podman (rootless)
  • Docker (rootless)

We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.

If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).

What I'm struggling with now is running Podman under Slurm itself on our compute nodes.

It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.

I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.

Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.

What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?

John Snowdon
Senior Research Infrastructure Engineer (HPC)

Research Software Engineering
Catalyst Building, Room 2.01
Newcastle University
3 Science Square
Newcastle Helix
Newcastle upon Tyne
NE4 5TG

Michael DiDomenico via slurm-users

unread,
Sep 5, 2025, 9:22:26 AM (9 days ago) Sep 5
to John Snowdon, slurm...@lists.schedmd.com
for what it's worth, we found the simplest solution was just to run a
prolog/epilog to create the directories and clean them up. it's only
a couple lines of bash.
> --
> slurm-users mailing list -- slurm...@lists.schedmd.com
> To unsubscribe send an email to slurm-us...@lists.schedmd.com

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Paul Edmon via slurm-users

unread,
Sep 5, 2025, 9:30:57 AM (9 days ago) Sep 5
to slurm...@lists.schedmd.com
We recently setup the same thing (Rocky 8). What we did was we set
/etc/containers/storage.conf and pointed the following variables to /tmp:

storage.conf:runroot = "/tmp/containers-user-$UID/storage"
storage.conf:graphroot = "/tmp/containers-user-$UID/storage"
storage.conf:rootless_storage_path = "/tmp/containers-user-$UID/storage"

We also have a prune script which cleans up /tmp periodically keeping it
clean.

I like your solution for subuid, we put together a puppet module that
does much the same thing: https://github.com/fasrc/puppet-subuid

-Paul Edmon-

Paul Edmon via slurm-users

unread,
Sep 5, 2025, 9:32:44 AM (9 days ago) Sep 5
to slurm...@lists.schedmd.com
For reference we used this puppet module for managing podman:
https://forge.puppet.com/modules/southalc/podman/readme

-Paul Edmon-

On 9/5/25 9:20 AM, Michael DiDomenico via slurm-users wrote:

John Snowdon via slurm-users

unread,
Sep 5, 2025, 9:50:42 AM (9 days ago) Sep 5
to slurm...@lists.schedmd.com
Hi Michael,

We're on RHEL 9 - it's a newly commissioned system without any (real) users yet, so we have the relative freedom to make fairly substantial changes without impacting any production work for the moment.

I've tried various combinations of storage.conf settings (we already set runroot to a similar /tmp location and graphroot is to an NFS-mounted user home so that the user image library persists across all nodes).... but I always found Podman throw an error relating to creating an 'events' folder under /run/user/$UID .. And I just couldn't figure out which setting this was (setting the podman event backend type to 'none' stopped the mkdir error, but also seemed to prevent Podman from running).

It sounds like the prolog/epilog solution is going to be the easiest route to resolve this.

I hadn't thought about a clean up script for those /tmp entries... but yeah, that's clearly going to need to be put in place as well.

Thanks for the ideas!

John 

From: Paul Edmon via slurm-users <slurm...@lists.schedmd.com>
Sent: 05 September 2025 14:29
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: Creating /run/user/$UID - for Podman runtime
 
⚠ External sender. Take care when opening links or attachments. Do not provide your login details.


We recently setup the same thing (Rocky 8). What we did was we set
/etc/containers/storage.conf and pointed the following variables to /tmp:

storage.conf:runroot = "/tmp/containers-user-$UID/storage"
storage.conf:graphroot = "/tmp/containers-user-$UID/storage"
storage.conf:rootless_storage_path = "/tmp/containers-user-$UID/storage"

We also have a prune script which cleans up /tmp periodically keeping it
clean.

I like your solution for subuid, we put together a puppet module that


-Paul Edmon-

On 9/5/25 9:20 AM, Michael DiDomenico via slurm-users wrote:
> for what it's worth, we found the simplest solution was just to run a
> prolog/epilog to create the directories and clean them up.  it's only
> a couple lines of bash.
>
> On Fri, Sep 5, 2025 at 7:59 AM John Snowdon via slurm-users
> <slurm...@lists.schedmd.com> wrote:
>> We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
>>
>> Apptainer
>> Podman (rootless)
>> Docker (rootless)
>>
>>
>> We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmegatron-uk%2Fpam_subid&data=05%7C02%7Cjohn.snowdon%40newcastle.ac.uk%7Cd8b416396e1944caff5208ddec80c62d%7C9c5012c9b61644c2a91766814fbe3e87%7C1%7C0%7C638926760013055007%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ZFO4mbu14EWbpYYJLdBl3nJGjzlU7yp1UN97XH3sN0Q%3D&reserved=0), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.

Roger Moye via slurm-users

unread,
Sep 5, 2025, 11:49:48 AM (9 days ago) Sep 5
to John Snowdon, slurm...@lists.schedmd.com

John, we ran into the same issues that you did.   One thing we discovered was that podman relies heavily on the $TMPDIR variable if it is set.   It seemed that in spite of making changes to storage.conf, podman still tried to use $TMPDIR for some of its state information.   Since $TMPDIR on our cluster was pointed at an NFS mount, that created all sorts of issues.

We implemented similar solutions as discussed in this thread.   However, jobs that are going to run podman had to be configured to unset $TMPDIR.    This allowed the rest of our podman config to work as intended.     So the existence of $TMPDIR kept interfering with our solution.   Easy to fix since our compute jobs are created using an automated build process.

 

Roger Moye

HPC Architect

713.898.0021 Mobile

 

QUANTLAB Financial, LLC

3 Greenway Plaza

Suite 200

Houston, Texas 77046

www.quantlab.com

 

 

From: John Snowdon via slurm-users <slurm...@lists.schedmd.com>
Sent: Friday, September 5, 2025 2:55 AM
To: slurm...@lists.schedmd.com
Subject: [slurm-users] Creating /run/user/$UID - for Podman runtime [External Email]

 

Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize and know the content is safe.

 -----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440

Christopher Samuel via slurm-users

unread,
Sep 6, 2025, 12:16:27 PM (8 days ago) Sep 6
to slurm...@lists.schedmd.com
On 9/5/25 12:55 am, John Snowdon via slurm-users wrote:

> What I'm struggling with now is running Podman under Slurm itself on our
> compute nodes.

We found that we had to make /run/user/$UID private per job via a script
run from the job_container/tmpfs plugin in order to stop jobs from the
same user on a node using podman (via podman-hpc) trashing each other.

The details (including the script and config) are in our public support
ticket where I was flailing around looking for how to do this with
CloneNSScript here: https://support.schedmd.com/show_bug.cgi?id=23228

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

John Snowdon via slurm-users

unread,
Sep 8, 2025, 5:41:59 AM (7 days ago) Sep 8
to slurm...@lists.schedmd.com
Thanks to everyone for your feedback.

We've now implemented two simple prolog/epilog scripts which call the systemd 'loginctl' tool and this is creating / cleaning up the /run/user/$UID directory tree nicely.

Our Podman setup also places runroot in an individual user directory on local scratch directories, and graphroot is in a NFS-shared user home directory, so accessible across all of our compute nodes.

This now seems to work really nicely. 

We've got subuid/subgid entries auto-generated on our login nodes to allow users to create/manage images there, but we made the design decision to not allow this on compute nodes, so we're currently running without that support. 

I suspect for 99.98% of use cases this won't be an issue (our policy is not to support network services run by this method, so for most users this should be more than satisfactory), the fact is; our users don't have container support on the old platform that this new system is replacing, so it's a net-gain in functionality for them.

John

From: Christopher Samuel via slurm-users <slurm...@lists.schedmd.com>
Sent: 06 September 2025 17:14

To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: Creating /run/user/$UID - for Podman runtime
⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

On 9/5/25 12:55 am, John Snowdon via slurm-users wrote:

> What I'm struggling with now is running Podman under Slurm itself on our
> compute nodes.

We found that we had to make /run/user/$UID private per job via a script
run from the job_container/tmpfs plugin in order to stop jobs from the
same user on a node using podman (via podman-hpc) trashing each other.

The details (including the script and config) are in our public support
ticket where I was flailing around looking for how to do this with


All the best,
Chris
--
Reply all
Reply to author
Forward
0 new messages