Table of contents

 

 

More Information:

List Partitions

To view the current status of all partitions accessible by the user:

$ sinfo -l

To view the current status of a partition named partitionName run:

$ sinfo -l partitionName

Display Partition Contents

To get list of all jobs running in partition named partitionName run:

$ squeue -p queueName

Same, limited to user userName:

$ squeue -p queueName -u userName

 

Control Nodes

Get nodes state

One of the following commands can be used to get node(s) state, depending on desired verbosity level:

# sinfo -N

or

# sinfo -o "%20N %.11T %.4c %.8z %.15C %.10O %.6m %.8d"

 

Commonly used options in #srun, #sbatch, #salloc:

-p partitionName

submit a job to queue queueName

-o output.log

Append job's output to output.log instead of slurm-%j.out in the current directory

-e error.log

Append job's STDERR to error.log instead of job output file (see -o above)

--mail-type=type

Email submitter on job state changes. Valid type values are BEGIN, END,FAIL, REQUEUE and ALL (any state change).

--mail-user=email

User to receive email notification of state changes (see –mail-type above)

-n N

--ntasks N

Set number of processors (cores) to N(default=1), the cores will be allocated to cores chosen by SLURM

-N N

--nodes N

Set number of nodes that will be part of a job.

On each node there will be --ntasks-per-node processes started.

If the option --ntasks-per-node is not given, 1 process per node will

be started

--ntasks-per-node N

How many tasks per allocated node to start (see -N above)

--cpus-per-task N

Needed for multithreaded (e.g. OpenMP) jobs. The option tells SLURM to allocate N cores per task allocated; typically N should be equal to the number of threads the program spawns, e.g. it should be set to the same number as OMP_NUM_THREADS

-J

--job-name

Set job name which is shown in the queue. The job name (limited to first 24 characters) is used in emails sent to the

user

-w node1,node2,...

Restrict job to run on specific nodes only

-x node1,node2,...

Exclude specific nodes from job

  


List of a few more helpfull slurm commands:  

Man pages exist for all SLURM daemons, commands, and API functions. The command option --help also provides a brief summary of options. Note that the command options are all case insensitive.

  • sacct is used to report job or job step accounting information about active or completed jobs.
  • salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
  • sattach is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times.
  • sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
  • sbcast is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
  • scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
  • scontrol is the administrative tool used to view and/or modify SLURM state. Note that many scontrol commands can only be executed as user root.
  • sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
  • smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
  • squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
  • srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
  • smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
  • strigger is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.
  • sview is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by SLURM.
If you need more information about slurm, please go to the official manual page of slurm where you can find more interesting information: https://computing.llnl.gov/linux/slurm/documentation.html