RCC users can run jobs either interactively or by submitting jobs to be scheduled to run on allocated resources (CPU time, memory, etc.) by a resource manager. The RCC uses SLURM (Simple Linux Utility for Resource Management) to manage workloads on its compute clusters (Midway and MidwayR). In addition to learning the best practices, this workshop will provide users with a clear understanding of all the compute partitions available at the RCC to which a job can be submitted, how to configure a Slurm job, to use Slurm commands, to submit a job and how to avoid common mistakes that usually result in a job waiting for a long time in the queue before running or failing to run.
Objectives
Participants will learn:
- The various Midway resources and partitions for running jobs
- Slurm commands, how to create a slurm batch script and how to submit batch jobs
- The RCC module system and run time environments
- How to submit serial single processor and parallel (OpenMP and MPI) multiple processor jobs
- How to submit GPU jobs
- How to submit a job that is carried on several times by a given code, differing only in the initial value of some high-level parameter for each run ( Slurm job array)
- How to pack jobs and schedule independent processes inside a Slurm job allocation
- How to submit Message passing parallel jobs (MPI), multi-threading (OpenMP) and hybrid jobs.
- How to request a slurm interactive session
- Best practices and how to debug your slurm script
Level: Intermediate
Github repository: https://github.com/rcc-uchicago/slurm_workshop_midway3.git
Prerequisite: Basic understanding of programming or scripting languages. Some familiarity with Linux CLI. Must have an active RCC account.