[slurm-users] srun weirdness

139 views
Skip to first unread message

Dj Merrill via slurm-users

unread,
May 14, 2024, 2:40:29 PM5/14/24
to slurm-users
I'm running into a strange issue and I'm hoping another set of brains
looking at this might help.  I would appreciate any feedback.

I have two Slurm Clusters.  The first cluster is running Slurm 21.08.8
on Rocky Linux 8.9 machines.  The second cluster is running Slurm
23.11.6 on Rocky Linux 9.4 machines.

This works perfectly fine on the first cluster:

$ srun --mem=32G --pty /bin/bash

srun: job 93911 queued and waiting for resources
srun: job 93911 has been allocated resources

and on the resulting shell on the compute node:

$ /mnt/local/ollama/ollama help

and the ollama help message appears as expected.

However, on the second cluster:

$ srun --mem=32G --pty /bin/bash
srun: job 3 queued and waiting for resources
srun: job 3 has been allocated resources

and on the resulting shell on the compute node:

$ /mnt/local/ollama/ollama help
fatal error: failed to reserve page summary memory
runtime stack:
runtime.throw({0x1240c66?, 0x154fa39a1008?})
    runtime/panic.go:1023 +0x5c fp=0x7ffe6be32648 sp=0x7ffe6be32618
pc=0x4605dc
runtime.(*pageAlloc).sysInit(0x127b47e8, 0xf8?)
    runtime/mpagealloc_64bit.go:81 +0x11c fp=0x7ffe6be326b8
sp=0x7ffe6be32648 pc=0x456b7c
runtime.(*pageAlloc).init(0x127b47e8, 0x127b47e0, 0x128d88f8, 0x0)
    runtime/mpagealloc.go:320 +0x85 fp=0x7ffe6be326e8 sp=0x7ffe6be326b8
pc=0x454565
runtime.(*mheap).init(0x127b47e0)
    runtime/mheap.go:769 +0x165 fp=0x7ffe6be32720 sp=0x7ffe6be326e8
pc=0x451885
runtime.mallocinit()
    runtime/malloc.go:454 +0xd7 fp=0x7ffe6be32758 sp=0x7ffe6be32720
pc=0x434f97
runtime.schedinit()
    runtime/proc.go:785 +0xb7 fp=0x7ffe6be327d0 sp=0x7ffe6be32758
pc=0x464397
runtime.rt0_go()
    runtime/asm_amd64.s:349 +0x11c fp=0x7ffe6be327d8 sp=0x7ffe6be327d0
pc=0x49421c


If I ssh directly to the same node on that second cluster (skipping
Slurm entirely), and run the same "/mnt/local/ollama/ollama help"
command, it works perfectly fine.


My first thought was that it might be related to cgroups.  I switched
the second cluster from cgroups v2 to v1 and tried again, no
difference.  I tried disabling cgroups on the second cluster by removing
all cgroups references in the slurm.conf file but that also made no
difference.


My guess is something changed with regards to srun between these two
Slurm versions, but I'm not sure what.

Any thoughts on what might be happening and/or a way to get this to work
on the second cluster?  Essentially I need a way to request an
interactive shell through Slurm that is associated with the requested
resources.  Should we be using something other than srun for this?


Thank you,

-Dj

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Feng Zhang via slurm-users

unread,
May 14, 2024, 3:27:18 PM5/14/24
to Dj Merrill, slurm-users
Looks more like a runtime environment issue.

Check the binaries:

ldd /mnt/local/ollama/ollama

on both clusters and comparing the output may give some hints.

Best,

Feng

Dj Merrill via slurm-users

unread,
May 14, 2024, 3:43:13 PM5/14/24
to slurm...@lists.schedmd.com
Hi Feng,
Thank you for replying.

It is the same binary on the same machine that fails.

If I ssh to a compute node on the second cluster, it works fine.

It fails when running in an interactive shell obtained with srun on that
same compute node.

I agree that it seems like a runtime environment difference between the
SSH shell and the srun obtained shell.

This is the ldd from within the srun obtained shell (and gives the error
when run):

[deej@moose66 ~]$ ldd /mnt/local/ollama/ollama
    linux-vdso.so.1 (0x00007ffde81ee000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x0000154f732cc000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x0000154f732c7000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x0000154f73000000)
    librt.so.1 => /lib64/librt.so.1 (0x0000154f732c2000)
    libdl.so.2 => /lib64/libdl.so.2 (0x0000154f732bb000)
    libm.so.6 => /lib64/libm.so.6 (0x0000154f72f25000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000154f732a0000)
    libc.so.6 => /lib64/libc.so.6 (0x0000154f72c00000)
    /lib64/ld-linux-x86-64.so.2 (0x0000154f732f8000)

This is the ldd from the same exact node within an SSH shell which runs
fine:

[deej@moose66 ~]$ ldd /mnt/local/ollama/ollama
    linux-vdso.so.1 (0x00007fffa66ff000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x000014a9d82da000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x000014a9d82d5000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x000014a9d8000000)
    librt.so.1 => /lib64/librt.so.1 (0x000014a9d82d0000)
    libdl.so.2 => /lib64/libdl.so.2 (0x000014a9d82c9000)
    libm.so.6 => /lib64/libm.so.6 (0x000014a9d7f25000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000014a9d82ae000)
    libc.so.6 => /lib64/libc.so.6 (0x000014a9d7c00000)
    /lib64/ld-linux-x86-64.so.2 (0x000014a9d8306000)


-Dj

Feng Zhang via slurm-users

unread,
May 14, 2024, 3:58:56 PM5/14/24
to Dj Merrill, slurm...@lists.schedmd.com
Not sure, very strange, while the two linux-vdso.so.1 looks different:

[deej@moose66 ~]$ ldd /mnt/local/ollama/ollama
linux-vdso.so.1 (0x00007ffde81ee000)


[deej@moose66 ~]$ ldd /mnt/local/ollama/ollama
linux-vdso.so.1 (0x00007fffa66ff000)

Best,

Feng

On Tue, May 14, 2024 at 3:43 PM Dj Merrill via slurm-users

Feng Zhang via slurm-users

unread,
May 14, 2024, 4:09:47 PM5/14/24
to Dj Merrill, slurm...@lists.schedmd.com
Do you have containers setting?

Hermann Schwärzler via slurm-users

unread,
May 15, 2024, 4:46:42 AM5/15/24
to slurm...@lists.schedmd.com
Hi Dj,

could be a memory-limits related problem. What is the output of

ulimit -l -m -v -s

in both interactive job-shells?

You are using cgroups-v1 now, right?
In that case what is the respective content of

/sys/fs/cgroup/memory/slurm_*/uid_$(id -u)/job_*/memory.limit_in_bytes

in both shells?

Regards,
Hemann

greent10--- via slurm-users

unread,
May 15, 2024, 7:23:01 AM5/15/24
to Hermann Schwärzler, slurm...@lists.schedmd.com
Hi,

When we first migrated to Slurm from PBS one of the strangest issues we hit was that ulimit settings are inherited from the submission host which could explain the different between ssh'ing into the machine (and the default ulimit being applied) and with running a job via srun.

You could use:

srun --propagate=NONE --mem=32G --pty bash

I still find Slurm inheriting ulimit and environment variables from the submission host an odd default behaviour.

Tom

--
Thomas Green Senior Programmer
ARCCA, Redwood Building, King Edward VII Avenue, Cardiff, CF10 3NB
Tel: +44 (0)29 208 79269 Fax: +44 (0)29 208 70734
Email: gree...@cardiff.ac.uk Web: http://www.cardiff.ac.uk/arcca

Thomas Green Uwch Raglennydd
ARCCA, Adeilad Redwood, King Edward VII Avenue, Caerdydd, CF10 3NB
Ffôn: +44 (0)29 208 79269 Ffacs: +44 (0)29 208 70734
E-bost: gree...@caerdydd.ac.uk Gwefan: http://www.caerdydd.ac.uk/arcca

-----Original Message-----
From: Hermann Schwärzler via slurm-users <slurm...@lists.schedmd.com>
Sent: Wednesday, May 15, 2024 9:45 AM
To: slurm...@lists.schedmd.com
Subject: [slurm-users] Re: srun weirdness

External email to Cardiff University - Take care when replying/opening attachments or links.
Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.

Dj Merrill via slurm-users

unread,
May 15, 2024, 9:45:34 AM5/15/24
to slurm...@lists.schedmd.com
Thank you Hemann and Tom!  That was it.

The new cluster has a virtual memory limit on the login host, and the
old cluster did not.

It doesn't look like there is any way to set a default to override the
srun behaviour of passing those resource limits to the shell, so I may
consider removing those limits on the login host so folks don't have to
manually specify this every time.

I really appreciate the help!

-Dj

Laura Hild via slurm-users

unread,
May 15, 2024, 2:19:38 PM5/15/24
to Dj Merrill, slurm...@lists.schedmd.com
PropagateResourceLimitsExcept won't do it?


________________________________________
Od: Dj Merrill via slurm-users <slurm...@lists.schedmd.com>
Poslano: sreda, 15. maj 2024 09:43
Za: slurm...@lists.schedmd.com
Zadeva: [EXTERNAL] [slurm-users] Re: srun weirdness

Thank you Hemann and Tom! That was it.

The new cluster has a virtual memory limit on the login host, and the
old cluster did not.

It doesn't look like there is any way to set a default to override the
srun behaviour of passing those resource limits to the shell, so I may
consider removing those limits on the login host so folks don't have to
manually specify this every time.

I really appreciate the help!

-Dj


On 5/15/24 07:20, greent10--- via slurm-users wrote:
> Hi,
>
> When we first migrated to Slurm from PBS one of the strangest issues we hit was that ulimit settings are inherited from the submission host which could explain the different between ssh'ing into the machine (and the default ulimit being applied) and with running a job via srun.
>
> You could use:
>
> srun --propagate=NONE --mem=32G --pty bash
>
> I still find Slurm inheriting ulimit and environment variables from the submission host an odd default behaviour.
>
> Tom
>
> --
> Thomas Green Senior Programmer
> ARCCA, Redwood Building, King Edward VII Avenue, Cardiff, CF10 3NB
> Tel: +44 (0)29 208 79269 Fax: +44 (0)29 208 70734
> Email: gree...@cardiff.ac.uk Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cardiff.ac.uk_arcca&d=DwIGaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=897kjkV-MEeU1IVizIfc5Q&m=94Q7i1VRjoZjBYeRehmS8_ns1RjitmxaanQjTsZeT4nVn5jZjxy9ARfUeywCHmmo&s=zHnwNoh0Qk3EBsMpU-Mum-ARPhKLa65Arp1ndQvw4cU&e=
>
> Thomas Green Uwch Raglennydd
> ARCCA, Adeilad Redwood, King Edward VII Avenue, Caerdydd, CF10 3NB
> Ffôn: +44 (0)29 208 79269 Ffacs: +44 (0)29 208 70734
> E-bost: gree...@caerdydd.ac.uk Gwefan: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.caerdydd.ac.uk_arcca&d=DwIGaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=897kjkV-MEeU1IVizIfc5Q&m=94Q7i1VRjoZjBYeRehmS8_ns1RjitmxaanQjTsZeT4nVn5jZjxy9ARfUeywCHmmo&s=2DevPnVhkvH0gqoWZ8tnKTPTLUPLaYGn_4zx70McYxg&e=

Dj Merrill via slurm-users

unread,
May 15, 2024, 2:27:52 PM5/15/24
to slurm...@lists.schedmd.com
I completely missed that, thank you! 

-Dj


Laura Hild via slurm-users wrote:
PropagateResourceLimitsExcept won't do it?
Sarlo, Jeffrey S wrote:
You might look at the PropagateResourceLimits and PropagateResourceLimitsExcept settings in slurm.conf

Patryk Bełzak via slurm-users

unread,
May 17, 2024, 5:05:37 AM5/17/24
to Dj Merrill, slurm...@lists.schedmd.com
Hi,

I wonder where does this problems come from, perhaps I am missing something, but we never had such issues with limits since we have it set on worker nodes in /etc/security/limits.d/99-cluster.conf:

```
* soft memlock 4086160 #Allow more Memory Locks for MPI
* hard memlock 4086160 #Allow more Memory Locks for MPI
* soft nofile 1048576 #Increase the Number of File Descriptors
* hard nofile 1048576 #Increase the Number of File Descriptors
* soft stack unlimited #Set soft to hard limit
* soft core 4194304 #Allow Core Files
```

and it sets up all limits we want without any problems, and there is no need to pass extra arguments to slurm commands or modify the config file.

Regards,
Patryk.

On 24/05/15 02:26, Dj Merrill via slurm-users wrote:
[-- Type: text/plain; charset=US-ASCII, Encoding: 7bit, Size: 0,2K --]
[-- Alternative Type #1: text/html; charset=UTF-8, Encoding: 8bit, Size: 1,0K --]

greent10--- via slurm-users

unread,
May 17, 2024, 5:32:38 AM5/17/24
to Patryk Bełzak, Dj Merrill, slurm...@lists.schedmd.com

Hi,

 

The problem comes from if the login nodes (or submission hosts) have different ulimits – maybe the submission hosts are VMs and not physical servers.  Then the ulimits will be passed from submission hosts in Slurm to the jobs compute node by default which can results in different settings being applied.  If the login nodes have the same ulimit settings then you may not see a difference.

 

We happened to see a difference due to moving to a virtualised login node infrastructure which has slightly different settings applied.

 

Does that make sense?

 

I also missed that setting in slurm.conf so good to know it is possible to change the default behaviour.


Tom

 

From: Patryk Bełzak via slurm-users <slurm...@lists.schedmd.com>
Date: Friday, 17 May 2024 at 10:15
To: Dj Merrill <sl...@deej.net>
Cc: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: srun weirdness

External email to Cardiff University - Take care when replying/opening attachments or links.
Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.



Patryk Bełzak via slurm-users

unread,
May 17, 2024, 6:19:01 AM5/17/24
to gree...@cardiff.ac.uk, Dj Merrill, slurm...@lists.schedmd.com
We do have diferent limits on submit host, and I believe that until we put `limits.d/99-cluster.conf` file the limits were passed to jobs, but can't tell for sure, it was long time ago.
Still, modyfying the `limits.d` on cluster nodes may be a different approach and solution to formentioned issue.

I wonder if anyone has an opinion which way is better and why - whether to modify the slurmctld.conf or node system limits.

Patryk.

On 24/05/17 09:30, greent10--- via slurm-users wrote:
[-- Type: text/plain; charset=windows-1250, Encoding: quoted-printable, Size: 2,5K --]
[-- Alternative Type #1: text/html; charset=windows-1250, Encoding: quoted-printable, Size: 5,8K --]
Reply all
Reply to author
Forward
0 new messages