Reason User Manual

0 views

Skip to first unread message

Kansas Eiffel

unread,

Aug 5, 2024, 12:20:32 AM8/5/24

to guancarehan

Slurmis an open source,fault-tolerant, and highly scalable cluster management and job scheduling systemfor large and small Linux clusters. Slurm requires no kernel modifications forits operation and is relatively self-contained. As a cluster workload manager,Slurm has three key functions. First, it allocates exclusive and/or non-exclusiveaccess to resources (compute nodes) to users for some duration of time so theycan perform work. Second, it provides a framework for starting, executing, andmonitoring work (normally a parallel job) on the set of allocated nodes. Finally,it arbitrates contention for resources by managing a queue of pending work.

The entities managed by these Slurm daemons, shown in Figure 2, includenodes, the compute resource in Slurm,partitions, which group nodes into logical (possibly overlapping) sets,jobs, or allocations of resources assigned to a user fora specified amount of time, andjob steps, which are sets of (possibly parallel) tasks within a job.The partitions can be considered job queues, each of which has an assortment ofconstraints such as job size limit, job time limit, users permitted to use it, etc.Priority-ordered jobs are allocated nodes within a partition until the resources(nodes, processors, memory, etc.) within that partition are exhausted. Oncea job is assigned a set of nodes, the user is able to initiate parallel work inthe form of job steps in any configuration within the allocation. For instance,a single job step may be started that utilizes all nodes allocated to the job,or several job steps may independently use a portion of the allocation.

salloc is used to allocate resourcesfor a job in real time. Typically this is used to allocate resources and spawn a shell.The shell is then used to execute srun commands to launch parallel tasks.

sbcast is used to transfer a filefrom local disk to local disk on the nodes allocated to a job. This can beused to effectively use diskless compute nodes or provide improved performancerelative to a shared file system.

squeue reports the state of jobs orjob steps. It has a wide variety of filtering, sorting, and formatting options.By default, it reports the running jobs in priority order and then the pendingjobs in priority order.

srun is used to submit a job forexecution or initiate job steps in real time.srunhas a wide variety of options to specify resource requirements, including: minimumand maximum node count, processor count, specific nodes to use or not use, andspecific node characteristics (so much memory, disk space, certain requiredfeatures, etc.).A job can contain multiple job steps executing sequentially or in parallel onindependent or shared resources within the job's node allocation.

First we determine what partitions exist on the system, what nodesthey include, and general system state. This information is providedby the sinfo command.In the example below we find there are two partitions: debugand batch.The * following the name debug indicates this is thedefault partition for submitted jobs.We see that both partitions are in an UP state.Some configurations may include partitions for larger jobsthat are DOWN except on weekends or at night. The informationabout each partition may be split over more than one line so thatnodes in different states can be identified.In this case, the two nodes adev[1-2] are down.The * following the state down indicate the nodes arenot responding. Note the use of a concise expression for nodename specification with a common prefix adev and numericranges or specific numbers identified. This format allows forvery large clusters to be easily managed.The sinfo commandhas many options to easily let you view the information of interestto you in whatever format you prefer.See the man page for more information.

The scontrol commandcan be used to report more detailed information aboutnodes, partitions, jobs, job steps, and configuration.It can also be used by system administrators to makeconfiguration changes. A couple of examples are shownbelow. See the man page for more information.

It is possible to create a resource allocation and launchthe tasks for a job step in a single command line using thesrun command. Dependingupon the MPI implementation used, MPI jobs may also belaunched in this manner.See the MPI section for more MPI-specific information.In this example we execute /bin/hostnameon three nodes (-N3) and include task numbers on the output (-l).The default partition will be used.One task per node will be used by default.Note that the srun command hasmany options available to control what resource are allocatedand how tasks are distributed across those resources.

One common mode of operation is to submit a script for later execution.In this example the script name is my.script and we explicitly usethe nodes adev9 and adev10 (-w "adev[9-10]", note the use of anode range expression).We also explicitly state that the subsequent job steps will spawn four taskseach, which will ensure that our allocation contains at least four processors(one processor per task to be launched).The output will appear in the file my.stdout ("-o my.stdout").This script contains a timelimit for the job embedded within itself.Other options can be supplied as desired by using a prefix of "#SBATCH" followedby the option at the beginning of the script (before any commands to be executedin the script).Options supplied on the command line would override any options specified withinthe script.Note that my.script contains the command /bin/hostnamethat executed on the first node in the allocation (where the script runs) plustwo job steps initiated using the srun commandand executed sequentially.

The final mode of operation is to create a resource allocationand spawn job steps within that allocation.The salloc command is usedto create a resource allocation and typically start a shell withinthat allocation.One or more job steps would typically be executed within that allocationusing the srun command to launch the tasks(depending upon the type of MPI being used, the launch mechanism maydiffer, see MPI details below).Finally the shell created by salloc wouldbe terminated using the exit command.Slurm does not automatically migrate executable or data filesto the nodes allocated to a job.Either the files must exists on local disk or in some global file system(e.g. NFS or Lustre).We provide the tool sbcast to transferfiles to local storage on allocated nodes using Slurm's hierarchicalcommunications.In this example we use sbcast to transferthe executable program a.out to /tmp/joe.a.out on local storageof the allocated nodes.After executing the program, we delete it from local storage

Consider putting related work into a single Slurm job with multiple jobsteps both for performance reasons and ease of management.Each Slurm job can contain a multitude of job steps and the overhead inSlurm for managing job steps is much lower than that of individual jobs.

Job arrays are an efficient mechanism ofmanaging a collection of batch jobs with identical resource requirements.Most Slurm commands can manage job arrays either as individual elements (tasks)or as a single entity (e.g. delete an entire job array in a single command).

MPI use depends upon the type of MPI being used.There are three fundamentally different modes of operation usedby these various MPI implementations.Slurm directly launches the tasks and performs initialization ofcommunications through the PMI2 or PMIx APIs. (Supported by mostmodern MPI implementations.)Slurm creates a resource allocation for the job and thenmpirun launches tasks using Slurm's infrastructure (older versions ofOpenMPI).Slurm creates a resource allocation for the job and thenmpirun launches tasks using some mechanism other than Slurm,such as SSH or RSH.These tasks are initiated outside of Slurm's monitoringor control. Slurm's epilog should be configured to purgethese tasks when the job's allocation is relinquished. Theuse of pam_slurm_adopt is also strongly recommended.Links to instructions for using several varieties of MPIwith Slurm are provided below.Intel MPIMPICH2MVAPICH2Open MPI

Having instructions that answer your customers' recurring questions will reduce your service tickets and consequently reduce your costs. Indeed, consumers have never interacted as much with the brand community and with the after-sales services as they do today, thanks to social networks and e-mails. But this results in high customer service tickets, which represent a significant cost for your company.

But why look for complexity when you can make it simple? Just use a 250 year old tool (spoiler: the user manual!) to answer the most common questions and propose a list of solutions to do independently in case of breakdowns.

When companies are asked the question: what is the purpose of your user manual? The answer is always "to protect us". Indeed, the instructions for use serve to meet certain legal obligations of warnings and cautions concerning the use of the product.

It is therefore indispensable to write clear instructions for use, which allow the consumer to be informed about everything he needs to know about his product before using it. Finally, in addition to the warnings, the company can also describe the normal use required for the product. This applies to both Kinder toys (e.g. choking hazard warnings) and personal protective equipment (PPE).

The instructions for use are therefore a means of protecting the consumer, but also of discharging your company from any liability in the event of an accident or inappropriate use. Indeed, if the mentions are not indicated on the product, its box or on its instructions, your company is entirely responsible for the damages, and legal proceedings often tarnish the image of a brand, and destroy the confidence of the consumers.