[slurm-dev] Front-end mode

2 views
Skip to first unread message

Sergio Iserte Agut

unread,
Oct 22, 2013, 12:52:52 PM10/22/13
to slurm-dev
Hello everybody,

I've been trying the front-end mode in order to simulate more resources than I really have. 
I've configured my SLURM 2.6.2 whith these lines:

NodeName=dummy[1-1200] NodeHostName=node0 NodeAddr=10.0.0.1
PartitionName=debug Nodes=dummy[1-1200] Default=YES MaxTime=INFINITE State=UP

Notice that node0 is the node where slurmctld and slumrd are running.

When I try to execute:
sudo srun -Nx hostname
Where x is greater than 128 I get:
srun: error: Unable to create job step: Task count specification invalid
srun: Force Terminated job 688
While when x is  less than or equal 128 the execution is OK.

Why can not I use more nodes?

Regards.

--
Sergio Iserte Agut, research assistant,
High Performance Computing & Architecture
University Jaume I (Castellón, Spain)
 

Moe Jette

unread,
Oct 22, 2013, 1:17:27 PM10/22/13
to slurm-dev

See MaxTasksPerNode in slurm.conf man page

Quoting Sergio Iserte Agut <sis...@uji.es>:

> Hello everybody,
>
> I've been trying the front-end mode in order to simulate more resources
> than I really have.
> I've configured my SLURM 2.6.2 whith these lines:
>
> NodeName=dummy[1-1200] NodeHostName=node0 NodeAddr=10.0.0.1
>> PartitionName=debug Nodes=dummy[1-1200] Default=YES MaxTime=INFINITE
>> State=UP
>
>
> Notice that node0 is the node where slurmctld and slumrd are running.
>
> When I try to execute:
>
>> sudo srun -Nx hostname
>
> Where x is greater than 128 I get:
>
>> srun: error: Unable to create job step: Task count specification invalid
>> srun: Force Terminated job 688
>
> While when x is less than or equal 128 the execution is OK.
>
> Why can not I use more nodes?
>
> Regards.
>
> --
> *Sergio Iserte Agut, research assistant,*
> *High Performance Computing & Architecture*
> *University Jaume I (Castellón, Spain)*
>

Sergio Iserte Agut

unread,
Oct 23, 2013, 3:10:02 AM10/23/13
to slurm-dev
Thank you! It was very helpful!

Now, I've continued testing and I've found other limitation:

$ sudo srun -N584 printenv | egrep 'SLURM_NODELIST'
slurmd[node0]: exec_wait_info_create: pipe: Too many open files
slurmd[node0]: child fork: Too many open files
srun: error: task 0 launch failed: Slurmd could not execve job

Maybe it's a machine limitation, but I'm not sure. Any idea?

Regards! 



2013/10/22 Moe Jette <je...@schedmd.com>

--
Sergio Iserte Agut, research assistant,
High Performance Computing & Architecture
University Jaume I (Castellón, Spain)
 

Moe Jette

unread,
Oct 23, 2013, 11:44:58 AM10/23/13
to slurm-dev
Reply all
Reply to author
Forward
0 new messages