Running jobs with Slurm

Slurm is a workload manager for clusters. It allocates resources to users for some duration of time, handle the execution of their jobs and prioritize them by managing a queue of pending work.

What you need :

  • data transferred to your home directory if needed

  • a program that will do the work

  • a script to prepare and run the above program with its parameters

Testing your environment

Before writing a script and submitting to Slurm, it is advised to test the environment in which your code will run with an interactive session.

You can obtain an interactive session this way :

[demo@datamaster ~]$ srun --partition debug -n 2 --mem 2G --pty /bin/bash
It is important to let Slurm know that resources are in used.
If you use another partition than debug, don’t forget to set the time limit with -t 2:00:00 (2 hours).
If a GPU is required, add this argument : --gres="gpu:1"

How to submit a job

Setting up a job script

To submit a job, you must write a bash script that will prepare the environment and launch your software with its parameters.

In the following script run-tf-multigpu_cnn.bash :

This example may be obsolete and needs to be checked
run-tf-multigpu_cnn.bash
#!/bin/bash
#SBATCH --job-name=multigpu_cnn (1)
#SBATCH --partition=gpu         (2)
#SBATCH -N 1                    (3)
#SBATCH -n 4                    (4)
#SBATCH --mem=5G                (5)
#SBATCH --gres="gpu:2"          (6)
#SBATCH -t 1:00:00              (7)
#SBATCH --mail-user=your.name@umons.ac.be (8)
#SBATCH --mail-type=ALL                   (9)

# Loading Anaconda module
module load anaconda3

# Loading an Anaconda environment
conda source tensorflow-gpu-1.8

echo "DATE : $(date)"
echo "_____________________________________________"
echo " HOSTNAME             : $HOSTNAME"
echo "_____________________________________________"
echo " CUDA_DEVICE_ORDER    : $CUDA_DEVICE_ORDER"
echo "_____________________________________________"
echo " CUDA_VISIBLE_DEVICES : $CUDA_VISIBLE_DEVICES"
echo "_____________________________________________"
nvidia-smi -L
echo "_____________________________________________"

# Starting the Python program and printing the time it took to complete
time python3 $HOME/multigpu_cnn.py
1 A name for the job
2 the partition
3 the number of servers to use
4 the number of CPU
5 the maximum memory it will need
6 if you need to use a GPU, here we reserve 2 of them
7 a 1 hour time limit
8 the mail address for notifications
9 the type of notification that will trigger an email

This bash script must be executable, so we set it with chmod command :

$ chmod a+x run-tf-multigpu_cnn.bash

Checking available resources

You can display informations about the available queues with sinfo :

[demo@datamaster ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
days         up 2-00:00:00      6   idle hpc[1-6]
week         up 7-00:00:00      6   idle hpc[1-6]
month        up 31-00:00:0      6   idle hpc[1-6]
gpu          up 1-00:00:00      3   idle deep[1-2],simu1
lgpu         up 7-00:00:00      3   idle deep[1-2],simu1
debug        up    4:00:00      9   idle deep[1-2],hpc[1-6],simu1

Meaning :

For the "days" partition, the job time limit is maximum 2 days.
It contains 6 nodes from hpc1 to hpc6 and they are all in the idle state waiting for submissions.

The "gpu" and "lgpu" partitions are available for running short and long jobs on servers with GPU’s for Deep Learning.

Finally, "debug" is a short lived partition that can be used for testing.

Submitting your job

Since we have set the sbatch options in the run-tf-multigpu_cnn.bash, we don’t have to specify them on the command line.

[demo@datamaster ~]$ sbatch run-tf-multigpu_cnn.bash
Submitted batch job 5336

Otherwise, we could have done :

[demo@datamaster ~]$ sbatch --partition=gpu -N 1 -n 4 --mem=5G --gres="gpu:2" -t 1:00:00 run-tf-multigpu_cnn.bash
Submitted batch job 5336

For more information on the available options :

[demo@datamaster ~]$ man sbatch

Verifying its state

[demo@datamaster ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              5336       gpu run-tf-m     demo  R       0:10      1 simu1

The job is Running on simu1 node since 10 seconds on the gpu partition.

An other state could be PD for pending.
If RESOURCE is displayed, it means that your job is waiting for resources to be freed.

We can view the output of the job by printing out the content from the slurm-5336.out that is created in the current directory.

[demo@datamaster ~]$ cat slurm-5336.out