#SBATCH --gres=gpu:V100:1
#SBATCH --account=nq46
#SBATCH --partition=m3g
When requesting a Tesla P100 GPU, you need to specify --partition=m3h
#SBATCH --gres=gpu:P100:1
#SBATCH --account=nq46
#SBATCH --partition=m3h
When requesting a Tesla T4 or A40 GPU, you need to specify --partition=gpu
#SBATCH --gres=gpu:T4:1
#SBATCH --account=nq46
#SBATCH --partition=gpu
#SBATCH --gres=gpu:A40:1
#SBATCH --account=nq46
#SBATCH --partition=gpu
Sample GPU Slurm scripts
To submit a job, if you need 1 node with 3 cores and 1 GPU, then the Slurm
submission script should look like:
#!/bin/bash
#SBATCH --job-name=MyJob
#SBATCH --account=nq46
#SBATCH --time=01:00:00
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --partition=m3h
If you need 6 nodes with 4 cpu cores and 2 GPUs on each node, then the Slurm
submission script should look like:
#!/bin/bash
#SBATCH --job-name=MyJob
#SBATCH --account=nq46
#SBATCH --time=01:00:00
#SBATCH --ntasks=24
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:2
#SBATCH --partition=m3g
Compiling your own CUDA or OpenCL codes for use on M3
M3 has been configured to allow CUDA (or OpenCL) applications to be compiled
(device independent code ONLY) on the Login node (no GPUs installed) for
execution on a Compute node (with GPU).
Login nodes can compile some of CUDA (or OpenCL) source code
(device independent code ONLY) but cannot run it
Compute nodes can compile all CUDA (or OpenCL) source code as well as execute it.
We strongly suggest you compile your code on a compute node. To do that, you
need to use an smux session to gain access to a compute node
smux new-session --gres=gpu:1 --partition=m3h
Once your interactive session has begun, load the cuda module
module load cuda
To check the GPU device information
nvidia-smi
deviceQuery
Then you should be able to compile the GPU code. Once compilation has run to
completion, without error, you can execute your GPU code.
Attention
If you attempt to run any CUDA (or OpenCL) application (compiled executable)
on the Login node, no CUDA device found error may be reported. This is
because no CUDA-enabled GPUs are installed on the Login node. You must run
GPU code on a compute node.