Hello,
> Danny Auble to slurm-dev:
> ...
> In the case of and other current plugin, the slurm_sched_schedule isn't
used for anything, it is a no-op.
> If a new plugin wanted to use it, it
should be used to make any changes to an internal job table in the
plugin,
> the function call is used to notify of a change in it's own job
table state.
> ....
I think, keeping, e.g, a timetable of jobs in a scheduler plugin is not really a usable way to go.
Please, correct me if I am wrong. Comment2 at the bottom says all important.
slurm version:
2.0.2
-------------------------------------------------------------------------------
Preliminaries:
(gdb) i br
Num Type Disp Enb Address What
1 breakpoint keep y 0x08097296 in slurm_sched_schedule at sched_plugin.c:280
2 breakpoint keep y 0x08097237 in slurm_sched_freealloc at sched_plugin.c:303
3 breakpoint keep y 0x08097267 in slurm_sched_newalloc at sched_plugin.c:291
4 breakpoint keep y 0x080971ed in slurm_sched_initial_priority at sched_plugin.c:317
5 breakpoint keep y 0x080971b6 in slurm_sched_job_is_pending at sched_plugin.c:331
6 breakpoint keep y 0x08097186 in slurm_sched_partition_change at sched_plugin.c:343
9 breakpoint keep y 0x080972c6 in slurm_sched_reconfig at sched_plugin.c:268
10 breakpoint keep y 0x080970dd in slurm_sched_requeue at sched_plugin.c:378
(gdb) run slurmctld -D
...
-------------------------------------------------------------------------------
Action1:
$ srun --begin=now+10000 hostname &
[1] 23673
Result1:
Breakpoint 4, slurm_sched_initial_priority (last_prio=4294901760, job_ptr=0x93413f8) at sched_plugin.c:317
317 {
(gdb) c
Breakpoint 1, slurm_sched_schedule () at sched_plugin.c:280
280 if ( slurm_sched_init() < 0 )
(gdb) c
...
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5 debug1 hostname clime PD 0:00 1 (JobHeld)
Comment1:
None
-------------------------------------------------------------------------------
Action2:
$ scancel 5
Result2:
srun: Job has been cancelled
No breakpoint hit in gdb.
Comment2:
When a pending job is cancelled, a scheduler plugin is not informed of it.
Let's suppose a passive scheduler plugin (i.e, it does not spawn any
threads) continuously builds its internal data -- timetable of jobs -- and
keeps them between its invocations. Then, when invoked, it should check
each internal job if it is consistent with the controller's job. To be
efficient, the plugin's job structure includes
a pointer to the related system job, so it always just dereferences and
checks entries. Everything goes fine for some time but then, on a sunny day,
-- segfault. A controller's job was cancelled and purged before
a plugin invocation, so the job pointer has become invalid in the meanwhile.
As a result, a scheduler plugin must check if all its jobs are still in the queue
by searching for their ids. Or it must always build the internal structure
from a scratch in accordance to the system state at the invocation time.
Neither of the options is pretty. If the scheduler is active, it might check
pointers periodically but that does not help, in fact.
To sum it up, keeping internal data about the world in a scheduler entails
we must check consistency of all of them all the time by the very
time-consuming process or count with a potential inconsistency (which would
be perhaps a better solution).
-------------------------------------------------------------------------------
Michal Novotny