More Information:
List Partitions
To view the current status of all partitions accessible by the user:
$ sinfo -l
To view the current status of a partition named partitionName run:
$ sinfo -l partitionName
Display Partition Contents
To get list of all jobs running in partition named partitionName run:
$ squeue -p queueName
Same, limited to user userName:
$ squeue -p queueName -u userName
Control Nodes
Get nodes state
One of the following commands can be used to get node(s) state, depending on desired verbosity level:
# sinfo -N
or
# sinfo -o "%20N %.11T %.4c %.8z %.15C %.10O %.6m %.8d"
Commonly used options in #srun, #sbatch, #salloc:
-p partitionName |
submit a job to queue queueName |
-o output.log |
Append job's output to output.log instead of slurm-%j.out in the current directory |
-e error.log |
Append job's STDERR to error.log instead of job output file (see -o above) |
--mail-type=type |
Email submitter on job state changes. Valid type values are BEGIN, END,FAIL, REQUEUE and ALL (any state change). |
--mail-user=email |
User to receive email notification of state changes (see –mail-type above) |
-n N --ntasks N |
Set number of processors (cores) to N(default=1), the cores will be allocated to cores chosen by SLURM |
-N N --nodes N |
Set number of nodes that will be part of a job. On each node there will be --ntasks-per-node processes started. If the option --ntasks-per-node is not given, 1 process per node will be started |
--ntasks-per-node N |
How many tasks per allocated node to start (see -N above) |
--cpus-per-task N |
Needed for multithreaded (e.g. OpenMP) jobs. The option tells SLURM to allocate N cores per task allocated; typically N should be equal to the number of threads the program spawns, e.g. it should be set to the same number as OMP_NUM_THREADS |
-J --job-name |
Set job name which is shown in the queue. The job name (limited to first 24 characters) is used in emails sent to the user |
-w node1,node2,... |
Restrict job to run on specific nodes only |
-x node1,node2,... |
Exclude specific nodes from job |
List of a few more helpfull slurm commands:
Man pages exist for all SLURM daemons, commands, and API functions. The command option --help also provides a brief summary of options. Note that the command options are all case insensitive.
- sacct is used to report job or job step accounting information about active or completed jobs.
- salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
- sattach is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times.
- sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
- sbcast is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
- scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
- scontrol is the administrative tool used to view and/or modify SLURM state. Note that many scontrol commands can only be executed as user root.
- sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
- smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
- squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
- srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
- smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
- strigger is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.
- sview is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by SLURM.
- << Prev
- Next