Cyrus,
Thanks for the input. Yes, I have considered features/constraints as
part of this, and I'm already using them for users to request IB. They
are definitely a key part of my strategy. I will look into Spank and
PriorityTiers. One of my goals is to reduce the amount of
scripting/customization I need to do, so if using Spank plugins requires
a lot of development on my part, that may be counterproductive for me.
> There are several ways to approach this and I imagine you really wish
> the users to be able to "just submit" with a minimum of effort and
> information on their part while your life is also manageable for changes
> or updates.
Not exactly. I wouldn't say I want them to 'just submit' with minimal
effort. I think that's a recipe for disaster - the don't specify the
right time limits, or correct resources, which then causes their job to
stay queued, prevent backfill scheduling from working, or they use a
node with 512 GB to run a single core job that only uses 4 GB of RAM.
What I want is for my users to think about the *resources* they need for
their job, and not what partition they submit to. Right now, they just
think about what partition they want their job to run on, and submit
their job to that partition. Often, they will always use the same queue
for every job, regardless of the differing resource requirements. While
there is some logic as to why my cluster is divided into the different
partitions, I find most users ignore this information, and just always
submit to the same queue, job after job, day after day, year after year.
I want my users to stop thinking in terms of partition names, and start
thinking in terms of what resources their job *really* needs. This will
ultimately improve cluster utilization, and reduce time spent in the
queue. Some users will submit a job, and as soon as it goes into the
pending state, they scancel it, change the partition name to a less
utilized partition, and resubmit it in the hopes it will start running
immediately.
Yes, there needs to be a lot of user training, and there's a lot I can
do to improve the environment for my users, but making the scheduler
more flexible needs to be one of the first steps in my vision to improve
things here.
Prentice