[slurm-dev] SPANK and job variables/options

19 views
Skip to first unread message

Dmitry Chirikov

unread,
Jan 26, 2017, 7:04:15 AM1/26/17
to slurm-dev
Hi all,

Playing with SPANK I faced up with an issue
Seems I can't get any useful bits about job which user is about to submit.

I am trying to add a hook to allocator context in slurm_spank_init or slurm_spank_init_post_opt functions. 

Idea was to check some option of the committed job file (or srun options) and make a decision about should it be accepted to be queued or not: 

But seems like it is not possible to get very basic variables like S_JOB_ARGV or S_JOB_ENV

If i get it right this check limits the variables spank able to use to, literally, slurm version.
https://github.com/SchedMD/slurm/blob/d7770b9b68d992f4f38417d5ef7204895954eff8/src/common/plugstack.c#L1765

Prolog facilies were looking promised, but they cannot drop the job - it can be re-queued only.

So the question is, how to do it properly, when variables are available to use, and it is not too late to deny user to submit a job?

Thank you in advance.

Kind regards,
Dmitry Chirikov

Sam Gallop (NBI)

unread,
Jan 30, 2017, 10:42:53 AM1/30/17
to slurm-dev

Hi Dmitry,

 

I just recently posted this for similar query about SPANK plugins (see below for the example code).  It helps dump some of the S_JOB_ARGV or S_JOB_ENV variables you are interested in.  It's very basic and there is lots that can be done with SPANK.  Example output looks like this …

<snip>

slurm_spank_task_init (ARGV): /bin/sleep

slurm_spank_task_init (ARGV): 30

slurm_spank_task_init (ENV): SLURM_JOB_NAME=slurm1.sh

slurm_spank_task_init (ENV): SLURMD_NODENAME=centos1

slurm_spank_task_init (ENV): SLURM_JOBID=266

slurm_spank_task_init (ENV): SLURM_JOB_PARTITION=debug

slurm_spank_task_init (ENV): SLURM_JOB_NUM_NODES=1

slurm_spank_task_init (ENV): SLURM_MEM_PER_NODE=1024

<snip>

 

# cat userenv.c

/*

*   To compile:

*    gcc -shared -o userenv.so userenv.c

*

*/

#include <sys/types.h>

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <string.h>

#include <libgen.h>

#include <signal.h>

#include <sys/resource.h>

 

#include <slurm/spank.h>

#include <slurm/slurm.h>

 

SPANK_PLUGIN(userenv, 1);

 

static int _userenv_opt_process (int val,

                                const char *optarg,

                                int remote);

void get_env(spank_t sp, char *function);

 

struct spank_option spank_options[] =

{

        {

                "userenv",

                "",

                "helper function",

                0,

                0,

                (spank_opt_cb_f) _userenv_opt_process

        },

        SPANK_OPTIONS_TABLE_END

};

 

int slurm_spank_task_init(spank_t sp, int ac, char **av)

{

        FILE *f;

        char filename[256];

        char function[50];

        int i, argc;

        char **argv;

        char **env;

        uint32_t s_id;

 

        sprintf(function, "slurm_spank_task_init");

 

        if (spank_get_item(sp, S_JOB_ID, &s_id) != ESPANK_SUCCESS) {

                slurm_info("userenv: Failed to get id");

                return (0);

        }

 

        sprintf(filename, "/tmp/%s_%d_%d", function, spank_context(), s_id);

        f = fopen(filename, "w");

 

        if (spank_get_item(sp, S_JOB_ARGV, &argc, &argv) == ESPANK_SUCCESS) {

                for (i = 0; i < argc; i++) {

                        fprintf(f, "%s (ARGV): %s\n", function, argv[i]);

                }

        }

 

        if (spank_get_item(sp, S_JOB_ENV, &env) == ESPANK_SUCCESS) {

                        while (*env != NULL) {

                                fprintf(f, "%s (ENV): %s\n", function, *env);

                                ++env;

                        }

        }

 

        fclose(f);

}

 

static int _userenv_opt_process(int val, const char *optarg, int remote)

{

        slurm_info("userenv: userenv plugin has been registered");

        return (0);

}

 

---

Sam Gallop

 

Description: Macintosh HD:Users:fretter:Documents:SugarSync Shared Folders:NBI:pics for hpc wiki:CiSlogo578x293.png

Have you tried looking through our Documentation Portal

Our documentation isn’t all text, check out our Video Tutorials

Have you tried looking through CiS Service Desk

Keep up to date with availability at CiS Service Status

More information on our HPC, Linux, Storage at HPC Support Site

 

To speak to us about technical issues feel free to call the Computing infrastructure for Science team on group phone extension 2003.

If your request is urgent, please contact the NBIP Computing Helpdesk at computing...@nbi.ac.uk or call phone extension 1234.

Dmitry Chirikov

unread,
Jan 30, 2017, 12:25:05 PM1/30/17
to slurm-dev
Hi Sam,

thanks for sharing. If i get it right you are enumerating variables after allocation (slurm_spank_task_init) on the running node, but I am really curious if it even possible to get those vars (or some vars) before nodes will be allocated:

int slurm_spank_init (spank_t sp, int ac, char **av) {
    if (spank_context () != S_CTX_ALLOCATOR) {
       return (0);
    }
...
}

I believe at this step srun command line options and #SBATCH instructions should be parsed already or, at least, available. 

Kind regards,
Dmitry

Kind regards,
Dmitry Chirikov

Sam Gallop (NBI)

unread,
Jan 30, 2017, 1:02:14 PM1/30/17
to slurm-dev

Hi,

 

By changing the entry point to slurm_spank_init I was able to retrieve the following (see below), but this was only when the context was S_CTX_ALLOCATOR.  When the context is != S_CTX_ALLOCATOR I got nothing.

 

slurm_spank_init (ARGV): sleep

slurm_spank_init (ARGV): 30

slurm_spank_init (ENV): SLURM_CHECKPOINT_IMAGE_DIR=/root

slurm_spank_init (ENV): SLURM_NODELIST=centos1

slurm_spank_init (ENV): SLURM_JOB_NAME=slurm1.sh

slurm_spank_init (ENV): SLURMD_NODENAME=centos1

slurm_spank_init (ENV): SLURM_TOPOLOGY_ADDR=centos1

slurm_spank_init (ENV): SLURM_PRIO_PROCESS=0

slurm_spank_init (ENV): SLURM_TOPOLOGY_ADDR_PATTERN=node

slurm_spank_init (ENV): SLURM_NNODES=1

slurm_spank_init (ENV): SLURM_JOBID=268

slurm_spank_init (ENV): SLURM_TASKS_PER_NODE=3

slurm_spank_init (ENV): SLURM_JOB_ID=268

slurm_spank_init (ENV): SLURM_CPUS_PER_TASK=1

slurm_spank_init (ENV): SLURM_JOB_USER=root

slurm_spank_init (ENV): SLURM_JOB_UID=0

slurm_spank_init (ENV): SLURM_NODEID=0

slurm_spank_init (ENV): SLURM_SUBMIT_DIR=/root

slurm_spank_init (ENV): SLURM_TASK_PID=2470

slurm_spank_init (ENV): SLURM_CPUS_ON_NODE=3

slurm_spank_init (ENV): SLURM_PROCID=0

slurm_spank_init (ENV): SLURM_JOB_NODELIST=centos1

slurm_spank_init (ENV): SLURM_LOCALID=0

slurm_spank_init (ENV): SLURM_JOB_CPUS_PER_NODE=3

slurm_spank_init (ENV): SLURM_CLUSTER_NAME=cluster

slurm_spank_init (ENV): SLURM_GTIDS=0

slurm_spank_init (ENV): SLURM_SUBMIT_HOST=centos1

slurm_spank_init (ENV): SLURM_JOB_PARTITION=debug

slurm_spank_init (ENV): SLURM_JOB_NUM_NODES=1

slurm_spank_init (ENV): SLURM_MEM_PER_NODE=1024

slurm_spank_init (ENV): SLURM_RLIMIT_CPU=18446744073709551615

slurm_spank_init (ENV): SLURM_RLIMIT_FSIZE=18446744073709551615

slurm_spank_init (ENV): SLURM_RLIMIT_DATA=18446744073709551615

slurm_spank_init (ENV): SLURM_RLIMIT_STACK=10485760

slurm_spank_init (ENV): SLURM_RLIMIT_CORE=0

slurm_spank_init (ENV): SLURM_RLIMIT_RSS=1073741824

slurm_spank_init (ENV): SLURM_RLIMIT_NPROC=15188

slurm_spank_init (ENV): SLURM_RLIMIT_NOFILE=1024

slurm_spank_init (ENV): SLURM_RLIMIT_MEMLOCK=65536

slurm_spank_init (ENV): SLURM_RLIMIT_AS=18446744073709551615

slurm_spank_init (ENV): SRUN_DEBUG=3

slurm_spank_init (ENV): SLURM_UMASK=0022

slurm_spank_init (ENV): SLURM_NTASKS=3

slurm_spank_init (ENV): SLURM_NPROCS=3

slurm_spank_init (ENV): SLURM_DISTRIBUTION=block

slurm_spank_init (ENV): SLURM_CPU_BIND_VERBOSE=quiet

slurm_spank_init (ENV): SLURM_CPU_BIND_TYPE=cores

slurm_spank_init (ENV): SLURM_CPU_BIND_LIST=

slurm_spank_init (ENV): SLURM_CPU_BIND=quiet,cores

slurm_spank_init (ENV): SLURM_STEP_ID=0

slurm_spank_init (ENV): SLURM_STEPID=0

slurm_spank_init (ENV): SLURM_SRUN_COMM_PORT=42199

slurm_spank_init (ENV): SLURM_STEP_NODELIST=centos1

slurm_spank_init (ENV): SLURM_STEP_NUM_NODES=1

slurm_spank_init (ENV): SLURM_STEP_NUM_TASKS=3

slurm_spank_init (ENV): SLURM_STEP_TASKS_PER_NODE=3

slurm_spank_init (ENV): SLURM_STEP_LAUNCHER_PORT=42199

slurm_spank_init (ENV): SLURM_SRUN_COMM_HOST=…

 

---

Sam Gallop

 

Dmitry Chirikov

unread,
Jan 30, 2017, 5:10:16 PM1/30/17
to slurm-dev
Hi Sam,

not sure I get you right how you achieve that. My code is here

#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/resource.h>
#include <slurm/spank.h>

SPANK_PLUGIN(dc_test, 1);

int slurm_spank_init (spank_t sp, int ac, char **av) {
    int i, argc;
    char function[50];
    char **argv;
    char **env;
    if (spank_context () != S_CTX_ALLOCATOR) {
       return (0);
    }
    sprintf(function, "slurm_spank_init");
    if (spank_get_item(sp, S_JOB_ARGV, &argc, &argv) == ESPANK_SUCCESS) {
        for (i = 0; i < argc; i++) {
            slurm_info( "%s (ARGV): %s\n", function, argv[i]);
        }
    } else {
        slurm_error("Unable to retrieve S_JOB_ARGV");
    }
    return (0);
}

Then submit job as user:

$ sbatch test.sh 
sbatch: error: Unable to retrieve S_JOB_ARGV
Submitted batch job 766

So, no cigar yet.

Kind regards,
Dmitry

Kind regards,
Dmitry Chirikov

Sam Gallop (NBI)

unread,
Jan 31, 2017, 4:48:28 AM1/31/17
to slurm-dev

Hi Dmitry,

 

I can see the confusion, my last mail was poorly worded.  When I ran the plugin in the S_CTX_REMOTE context I was able to retrieve some information.  I wasn't able to retrieve any data when running in the S_CTX_ALLOCATOR context.

 

Again, apologies for the confusion.

Carlos Fenoy

unread,
Jan 31, 2017, 4:53:56 AM1/31/17
to slurm-dev
Hi, 

You should take a look at the job_submit plugin. That is the nest place to check if a job should be queued or it can be rejected otherwise.

Regards,
Carlos
--
--
Carles Fenoy
Reply all
Reply to author
Forward
0 new messages