Hi Bhaskar,
I think I'm missing something here.
All the below points are valid, but mainly of concern for shared
resources. You have your private, non-sharing resources, dedicated to
the app, or so I understood.
Does your app compete against itself? Do users of the app interfere with
jobs of each other? Are the nodes mutable? Do you expect resources to be
added to your dedicated subcluster/partition unannounced?
On 18/07/2024 16:39:25, Bhaskar Chakraborty wrote:
> Hi Daniel,
>
> Appreciate your response.
>
> I think you may be feeling that since we take the placement part of
> the scheduling to ourselves then Slurm has no other role to play!
>
> That's not quite true. Below in brief are other important roles which
> Slurm must perform which presently come to my mind
> (this mayn't be exhaustive):
>
> 1> Slurm should inform the job scheduling priority order.
> Admin can impose policies (fairshare, usergroup preference etc)
> which can prioritize more recent jobs compared to older ones &
> we would like to see them as computed by Slurm on a real time basis.
Such policies are cluster wide and usually immutable unless there is a
good reason to modify them. As owners of just some of the resources, the
most you can usually expect is to set priority internally (only between
your own jobs) via flags or qos.
>
> 2> Slurm should update us the info for any configured resource limits.
> Limits on resources can be there like CPU Cores, number of
> running jobs, host group CPU limits etc.
> Our backend app need to be updated from time to time about the
> same so that unnecessary allocates are avoided right away.
Resource limits shouldn't change. Number of running jobs are a known
quantity as they are product of the app. and the app is the only thing
that produces jobs in those nodes.
In the case of node failures, any monitoring solution will do the job as
well as Slurm. Slurm is convenient in that regard, but not essential.
>
> 3> Preemptable job candidates.
> Admin can mark certain jobs from certain users as preemptable ones.
> Our app needs to be informed about that should the need arise to
> preempt running jobs.
The nodes are yours. What would preempt them aside from you?
>
> 4> Specified host resources for job start.
> User may want their jobs to start on specific hosts & the same
> should be communicated back by Slurm.
> Similarly, the same applies if user wants his job to run on a
> certain set of hosts.
The impetus for this discussion was that your app is the arbiter of
placement, and only your app is running on those nodes.
I think you haven't decided (programmatically speaking) on an exact flow
for all edge cases. If your app chooses placement based on it's own
algorithm, the users SHOULD be able to add that kind of input to the app
directly, rather than some ping-pong between the app, Slurm and the user.
>
> 5> Preferential hosts for scheduling. If there is some preferential
> order of hosts or backfill scheduling enabled the same
> needs to be communicated to us.
Backfill is valid, but the rest is not, considering your app chooses
placement. Or does your app chooses placement overlapping previous
placements? Does it not preserve state?
>
> 6> Regular intimation of job events, like dispatch , suspended,
> finish, re-submission etc so that we can take appropriate action.
All valid, but doesn't invalidate my statement. You don't need anything
special from Slurm for these.
>
> Hope this clears our requirements & expectations.
Almost any solution that I can think of for your requirements requires
admin level changes to Slurm - using a healthcheckprogram, prologs,
submit plugins - all are cluster-wide modifications that would affect
other users of the cluster, not just your nodes.
That is why I suggested you might be better off using your own private
solution, since Slurm really is not designed to work with external
placement. It can be done but would be suboptimal.
I still believe the best option is to rewrite the app to communicate the
placement requirements (based on the algorithm and previous runs as
input) to Slurm as a simple string of sbatch flags, and just let Slurm
do it's thing. It sounds simpler than forcing all other users of the
cluster to adhere to your particular needs without introducing
unnecessary complexity to the cluster.
>
> Regards,
> Bhaskar.
Regards,
--Dani_L.