SLURM: Building a Reliable Control Plane with systemd

Introduction

Slurm (Simple Linux Utility for Resource Management) is an open-source workload manager and job scheduler designed for Linux-based high-performance computing environments. Its primary role is to allocate compute resources, queue and schedule jobs, and enforce fair-use policies across shared infrastructure such as CPU clusters, GPU nodes, and large-memory systems. Rather than relying on ad-hoc execution or manual coordination, Slurm provides a structured and policy-driven approach to running workloads at scale.

At its core, Slurm answers three fundamental questions for any compute cluster: who is allowed to run workloads, what resources they are permitted to consume, and when and where those workloads are executed. By centralizing these decisions, Slurm removes the need to run jobs directly on login nodes or to manually manage resource usage. This centralized control plane makes multi-user systems more predictable, safer to operate, and easier to scale as demand grows.

Slurm effectively transforms a collection of Linux machines into a cohesive compute platform governed by policies. Users can submit both batch and interactive jobs without needing to know which physical machine will execute them. Resources such as CPU cores, memory, GPUs, and wall-clock time are enforced automatically, while the scheduler optimizes placement and timing using mechanisms such as priority ordering and backfill.

Because of this flexibility, Slurm is suitable for a wide range of deployment sizes and use cases. It can be used to manage a single node shared by multiple researchers, a small on-premises lab cluster, or a large institutional HPC or AI platform supporting many teams and workloads. The underlying concepts remain consistent across all of these environments, which is one of Slurm’s greatest strengths.

This article walks through setting up Slurm from scratch, beginning with a single node that acts as both the controller and compute host. The configuration patterns introduced here scale naturally to multi-node clusters and more complex environments.

This walkthrough assumes a Linux host with Docker installed and basic familiarity with systemd and shell scripting.

Slurm Architecture

flowchart LR
    User[Users / Researchers]

    subgraph ControlPlane["Slurm Control Plane (Dockerized)"]
        OOD[Open OnDemand]
        SLURMCTLD[slurmctld<br/>Scheduler & Controller]
        MUNGE_CTRL[Munge Auth]
        CONF[slurm.conf<br/>Cluster Policy]
        STATE[Slurm State<br/>Job & Node State]
    end

    subgraph ComputeNodes["Compute Nodes (Bare Metal / VM)"]
        SLURMD[slurmd<br/>Execution Daemon]
        MUNGE_NODE[Munge Auth]
        CGROUPS[cgroups<br/>Resource Enforcement]
    end

    User -->|Web / CLI| OOD
    OOD -->|Job Submission| SLURMCTLD

    SLURMCTLD -->|Scheduling Decisions| SLURMD
    SLURMD -->|Status & Heartbeats| SLURMCTLD

    MUNGE_CTRL <-->|Auth Tokens| MUNGE_NODE

    SLURMCTLD --> CONF
    SLURMCTLD --> STATE

    SLURMD --> CGROUPS

Slurm follows a centralized control and distributed execution model that cleanly separates scheduling decisions from workload execution. Even in the smallest possible deployment, a Slurm cluster is composed of a few well-defined components that work together to manage compute resources in a predictable and scalable way.

At a high level, a Slurm cluster consists of:

  • A controller service that makes scheduling decisions
  • One or more compute nodes that execute workloads
  • A shared authentication mechanism
  • A configuration layer that defines cluster policy

In this article, all components are deployed on a single host to keep the initial setup simple. This mirrors the architecture of a larger multi-node cluster and can be expanded later without redesign. The remainder of this section briefly describes the core services shown above and their roles in the control plane.

Slurm Controller (slurmctld)

The Slurm controller daemon (slurmctld) is the brain of the cluster. It accepts job submissions, tracks available resources, decides when and where jobs should run, and enforces scheduling policies such as priorities and backfill. The controller does not execute workloads itself; instead, it coordinates execution by communicating with compute nodes.

Slurm Node Daemon (slurmd)

Each compute node runs the Slurm node daemon (slurmd). This daemon advertises available resources (CPU, memory, GPUs), launches assigned jobs, enforces resource limits, and reports status back to the controller. In a single-node setup, the same host runs both slurmctld and slurmd.

Authentication with Munge

Authentication and trust between Slurm components are provided by Munge, a lightweight authentication service designed specifically for HPC environments. All nodes in a Slurm cluster must share the same Munge key and have consistent user and group identifiers. If Munge is not functioning correctly, Slurm will fail to operate, which is why authentication is treated as a first-class concern.

Configuration (slurm.conf)

Cluster configuration is defined primarily through /etc/slurm/slurm.conf. This file specifies the cluster name, controller host, node definitions, partitions, and scheduling policies. The same configuration model scales cleanly from a single-node deployment to clusters with hundreds or thousands of nodes.

Why systemd Inside the Controller Container?

Early experiments that attempted to run the Slurm controller as a thin container—launching slurmctld directly as the container entrypoint—proved to be fragile. Slurm is designed to operate as a long-lived system service and makes implicit assumptions about its execution environment, including the presence of stable PID files, well-defined runtime directories, proper cgroup integration, predictable service ordering, and controlled restart semantics. Attempting to work around these assumptions inside a minimalist container introduces unnecessary complexity and leads to hard-to-diagnose failure modes.

Instead, the controller is deliberately treated as a VM-like Linux system. It runs Ubuntu 24.04 with systemd as PID 1 inside a privileged container, uses the host cgroup namespace, and persists state and configuration through mounted volumes. This model aligns closely with how Slurm is intended to run on real hardware, preserves native service management semantics, and results in a control plane that behaves predictably and reliably under restart, upgrade, and failure scenarios

Controller Deployment Model

In this setup, Docker runs on a standard Linux host, and the controller container represents the logical Slurm control plane.

docker-compose.yml

services:
  controller:
    image: controller:latest
    container_name: controller
    hostname: controller
    privileged: true
    cgroup: "host"
    restart: unless-stopped
    volumes:
      - ./slurm/etc:/etc/slurm
      - ./slurm/state:/var/spool/slurmctld
      - ./slurm/log:/var/log/slurm
      - ./munge:/etc/munge
      - ./munge-lib:/var/lib/munge
      - ./ood:/srv/ood

This container:

  • Runs systemd as PID 1
  • Manages Slurm services using native unit files
  • Persists state and configuration on the host
  • Can be restarted or rebuilt without data loss

Controller Image (Ubuntu 24.04 + systemd)

Dockerfile

JavaScript
FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      systemd systemd-sysv dbus \
      ca-certificates iproute2 iputils-ping \
      nano vim curl less \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

COPY /src/controller-slurm.sh /opt/controller-slurm.sh
RUN chmod +x /opt/controller-slurm.sh

RUN systemctl set-default multi-user.target || true

STOPSIGNAL SIGRTMIN+3
CMD ["/sbin/init"]

The container boots like a normal Linux system. No Slurm packages are installed at build time. All provisioning happens inside the running container using a single, explicit controller bootstrap script.

Controller Bootstrap Script (Single Source of Truth)

The controller is fully provisioned using a single script:

JavaScript
#!/usr/bin/env bash
set -euo pipefail
# ============================================================
# Script purpose:
#   Provision a Slurm controller node with configless support.
#   Installs slurmctld/slurmd, sets up MUNGE, generates slurm.conf,
#   and enables JWT in slurm.conf for scontrol token.
#   (NO slurmrestd parts in this script)
# ============================================================

# =========================
# Tunables (override via env)
# =========================
CLUSTER_NAME="${CLUSTER_NAME:-mini}"         # cluster name
PARTITION_NAME="${PARTITION_NAME:-debug}"    # default partition
NODES="${NODES:-$(hostname -s)}"             # extra nodes (range ok)
ENABLE_CONFIGLESS="${ENABLE_CONFIGLESS:-1}"  # 1=yes
CONTROLLER_HOST="${CONTROLLER_HOST:-$(hostname -f || hostname -s)}"
GPU_ENABLE="${GPU_ENABLE:-0}"
OPEN_PORTS="${OPEN_PORTS:-1}"                # open slurm ports (6817-6819)

# =========================
# Derived values
# =========================
HOST_SHORT="$(hostname -s)"
HOST_FQDN="$(hostname -f || echo "$HOST_SHORT")"
CPUS="$(nproc)"
MEM_MB="$(free -m | awk '/Mem:/ {printf "%d", $2*0.95}')"
echo "CPUS=$CPUS MEM_MB=$MEM_MB"

# =========================
# Helpers
# =========================
log()  { echo "[*] $*"; }
warn() { echo "[!] $*"; }
die()  { echo "[x] $*" >&2; exit 1; }
need_root() { [[ $EUID -eq 0 ]] || die "Run as root or with sudo."; }

need_root

# =========================
# Packages
# =========================
log "Installing packages..."
apt-get update -y
apt-get install -y \
  slurm-wlm slurmctld slurmd slurm-client \
  munge libmunge2 libmunge-dev jq chrony libpmix2 libpmix-dev binutils

# (Optional) open firewall ports for Slurm only
if [ "${OPEN_PORTS}" = "1" ] && command -v ufw >/dev/null 2>&1; then
  log "Opening Slurm ports with ufw..."
  ufw allow 6817:6819/tcp || true
fi

# =========================
# Hostname sanity
# =========================
log "Hostname sanity..."
if ! grep -qE "[[:space:]]${HOST_SHORT}(\s|$)" /etc/hosts; then
  echo "127.0.1.1  ${HOST_SHORT} ${HOST_FQDN}" | tee -a /etc/hosts >/dev/null
fi

# =========================
# Time sync
# =========================
log "Enabling chrony..."
systemctl enable --now chrony

# =========================
# MUNGE setup
# =========================
log "MUNGE setup..."
install -o munge -g munge -m 0700 -d /etc/munge
install -o munge -g munge -m 0700 -d /var/lib/munge /var/log/munge /run/munge

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
if [ ! -f /etc/munge/munge.key ]; then
  if [ -f "$SCRIPT_DIR/create-munge-key.sh" ]; then
    sed -i 's/\r$//' "$SCRIPT_DIR/create-munge-key.sh" || true
    chmod +x "$SCRIPT_DIR/create-munge-key.sh"
    bash "$SCRIPT_DIR/create-munge-key.sh"
  else
    dd if=/dev/urandom of=/etc/munge/munge.key bs=1 count=1024 status=none
  fi
  chown munge:munge /etc/munge/munge.key
  chmod 0400 /etc/munge/munge.key
fi

systemctl enable --now munge
mkdir -p /run/munge && chown -R munge:munge /run/munge && chmod 700 /run/munge
systemctl restart munge

log "MUNGE self-test..."
if ! munge -n | unmunge >/dev/null 2>&1; then
  die "MUNGE self-test failed. Check /var/log/munge/munged.log"
fi

# =========================
# Slurm directories
# =========================
log "Creating Slurm state/log dirs..."
mkdir -p /var/spool/slurmctld /var/spool/slurmd /var/log/slurm
chown -R slurm:slurm /var/spool/slurmctld /var/spool/slurmd /var/log/slurm
chmod 755 /var/spool/slurmctld /var/spool/slurmd
: > /var/log/slurmctld.log && chown slurm:slurm /var/log/slurmctld.log
: > /var/log/slurmd.log   && chown slurm:slurm /var/log/slurmd.log

# =========================
# Generate slurm.conf
# =========================
# gather nodes from CLI or $NODES
NODES_CLI="$(printf '%s' "${*:-}" | tr ' ' ',' | sed 's/^,\+//;s/,\+$//;s/,,\+/,/g')"
if [ -z "$NODES_CLI" ] && [ -n "${NODES:-}" ]; then
  NODES_CLI="$(printf '%s' "$NODES" | tr ' ' ',' | sed 's/^,\+//;s/,\+$//;s/,,\+/,/g')"
fi
ALL_NODES="${HOST_SHORT}${NODES_CLI:+,${NODES_CLI}}"

SLURM_CONF=/etc/slurm/slurm.conf
log "Writing $SLURM_CONF ..."
tee "$SLURM_CONF" >/dev/null <<EOF
ClusterName=${CLUSTER_NAME}
SlurmctldHost=${CONTROLLER_HOST}
SlurmUser=slurm
AuthType=auth/munge
$( [ "${ENABLE_CONFIGLESS}" = "1" ] && echo "SlurmctldParameters=enable_configless" )

SlurmctldLogFile=/var/log/slurmctld.log
SlurmdLogFile=/var/log/slurmd.log
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory
ProctrackType=proctrack/linuxproc
ReturnToService=2
AccountingStorageType=accounting_storage/none
MpiDefault=none

SlurmctldPort=6817
SlurmdPort=6818
$( [ "${GPU_ENABLE}" = "1" ] && echo "GresTypes=gpu" )

NodeName=${HOST_SHORT} CPUs=${CPUS} RealMemory=${MEM_MB} State=UNKNOWN
$( [ -n "$NODES_CLI" ] && echo "NodeName=${NODES_CLI} CPUs=${CPUS} RealMemory=${MEM_MB} State=UNKNOWN" )

PartitionName=${PARTITION_NAME} Nodes=${ALL_NODES} Default=YES MaxTime=INFINITE State=UP
EOF

# =========================
# JWT (for scontrol token only)
# =========================
# Create a JWT key and enable AuthAlt* in slurm.conf.
sudo dd if=/dev/urandom of=/etc/slurm/jwt_hs256.key bs=32 count=1 status=none
sudo chown slurm:slurm /etc/slurm/jwt_hs256.key
sudo chmod 600 /etc/slurm/jwt_hs256.key


# Ensure AuthAlt lines exist
awk '
  BEGIN{t=0;p=0}
  /^AuthAltTypes/      { $0="AuthAltTypes=auth/jwt"; t=1 }
  /^AuthAltParameters/ { $0="AuthAltParameters=jwt_key=/etc/slurm/jwt_hs256.key"; p=1 }
  {print}
  END{
    if(!t) print "AuthAltTypes=auth/jwt";
    if(!p) print "AuthAltParameters=jwt_key=/etc/slurm/jwt_hs256.key";
  }
' /etc/slurm/slurm.conf > /etc/slurm/slurm.conf.new && mv /etc/slurm/slurm.conf.new /etc/slurm/slurm.conf

# =========================
# Configless slurmd setup
# =========================
log "Pointing slurmd at conf-server=${CONTROLLER_HOST} …"
if [[ -f /etc/default/slurmd ]]; then
  if grep -q '^SLURMD_OPTIONS=' /etc/default/slurmd; then
    sed -i 's|^SLURMD_OPTIONS=.*|SLURMD_OPTIONS="--conf-server='"${CONTROLLER_HOST}"'"|' /etc/default/slurmd
  else
    echo 'SLURMD_OPTIONS="--conf-server='"${CONTROLLER_HOST}"'"' >> /etc/default/slurmd
  fi
else
  echo 'SLURMD_OPTIONS="--conf-server='"${CONTROLLER_HOST}"'"' > /etc/default/slurmd
fi

# =========================
# Start/restart services
# =========================
log "Enable and start slurm daemons..."
systemctl daemon-reload || true
systemctl enable slurmctld slurmd >/dev/null 2>&1 || true
systemctl restart slurmctld
systemctl restart slurmd
scontrol reconfigure || true

# =========================
# Smoke tests
# =========================
log "Smoke test..."
sinfo || warn "sinfo not ready yet"
srun -N1 -n1 hostname || warn "srun failed"

log "Attempting to mint a JWT (optional)..."
if command -v scontrol >/dev/null 2>&1; then
  TOKEN="$(scontrol token 2>/dev/null | tail -n1 | tr -d '\r\n')"
  if [ -n "${TOKEN}" ]; then
    echo "${TOKEN}" > /tmp/SLURM_JWT
    log "JWT saved to /tmp/SLURM_JWT"
  else
    warn "'scontrol token' produced no token (check AuthAlt* in slurm.conf)."
  fi
fi

# =========================
# Verify services
# =========================
systemctl --no-pager --full status munge || true
systemctl --no-pager --full status slurmctld || true
systemctl --no-pager --full status slurmd || true

echo
echo "[DONE] Controller ready on ${HOST_SHORT}."
echo "      - Cluster: ${CLUSTER_NAME}"
echo "      - Nodes in default partition: ${NODES}"
echo
if [ "${ENABLE_CONFIGLESS}" = "1" ]; then
  cat <<'TIP'
TIP: For compute nodes, copy /etc/munge/munge.key, install Slurm,
and point slurmd at this controller:
  CTRL=<controller-fqdn-or-ip>
  echo "SLURMD_OPTIONS=\"--conf-server=$CTRL\"" | sudo tee /etc/default/slurmd
  sudo systemctl enable --now munge
  sudo systemctl restart slurmd
Then verify from the controller:
  sinfo -N -l
  srun -w <node> -N1 -n1 hostname
TIP
fi

To provision the controller, you must shell into the running controller container and execute this script once. The script performs a full controller bring-up, but it is not intended to eliminate all operator involvement. In practice, you may need to make minor adjustments after the initial run—most commonly to address filesystem permissions, directory ownership, or environment-specific paths referenced by slurm.conf. These issues are straightforward to identify by inspecting service logs and unit status via systemctl status slurmctld and journalctl, and they are expected when adapting the controller to a real system

This script performs the following steps:

1. System Preparation

  • Verifies root access
  • Installs required packages:
    • slurmctld, slurmd, slurm-client
    • munge
    • time sync (chrony)
  • Optionally opens Slurm ports (6817–6819)

2. Time Synchronization

Time skew breaks Slurm authentication. The script enables and starts chrony immediately.

3. Munge Setup and Validation

Munge is treated as a first-class dependency:

  • Creates runtime directories with correct ownership
  • Generates a secure key if none exists
  • Starts and validates munged
  • Performs an end-to-end munge | unmunge self-test

If Munge fails, the script exits immediately.

4. Generating slurm.conf

the script generates a clean, explicit slurm.conf based on the live system.

Key properties:

  • Cluster identity
  • Controller hostname
  • Scheduler configuration
  • Node resources derived dynamically
  • Optional configless support
  • Optional GPU support
  • Explicit ports and runtime paths
ClusterName=mini
SlurmctldHost=controller
SlurmUser=slurm

AuthType=auth/munge

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory

NodeName=controller CPUs=<detected> RealMemory=<detected> State=UNKNOWN
PartitionName=debug Nodes=controller Default=YES MaxTime=INFINITE State=UP

5. Configless Slurmd Support

Configless mode allows compute nodes to retrieve their configuration directly from the controller.

The script enables this by:

  • Setting SlurmctldParameters=enable_configless
  • Pointing slurmd at the controller using:
SLURMD_OPTIONS="--conf-server=<controller>"

This significantly simplifies node onboarding in multi-node environments.

6. JWT Support for Secure Tokens

The script optionally enables JWT authentication for scontrol token:

  • Generates a secure HS256 key
  • Configures AuthAltTypes=auth/jwt
  • Stores the key with correct ownership and permissions

This enables modern token-based workflows without enabling REST services.

7.Service Enablement and Startup

All services are managed natively via systemd:

  • munge
  • slurmctld
  • slurmd

The script:

  • Enables services on boot
  • Restarts them in the correct order
  • Issues a scontrol reconfigure
  • Performs basic smoke tests (sinfo, srun)

Where We Are Now

At this point:

  • The controller is a fully initialized Slurm node
  • Authentication is validated
  • Configuration is explicit and reproducible
  • Services are managed by systemd
  • The node can schedule and execute jobs

Most importantly, the control plane behaves exactly like it would on real hardware.

Scope and What This Article Deliberately Does Not Cover

This article is intentionally limited to bringing up a functional Slurm controller with a clean control plane:

  • slurmctld and slurmd running under systemd
  • Munge authentication correctly initialized and validated
  • A minimal, modern slurm.conf generated deterministically
  • A single node acting as both controller and compute host

No assumptions are made about:

  • Front-end portals (e.g., Open OnDemand)
  • Multi-node expansion
  • GPU scheduling
  • Accounting, quotas, or fair-share
  • REST APIs or workflow orchestration

Those concerns are orthogonal to establishing a correct and stable control plane and are intentionally deferred. The goal here is simple and foundational – Slurm must start cleanly, authenticate correctly, and schedule work predictably. Everything else builds on that.

Future articles will revisit:

  • Adding external compute nodes (configless and traditional)
  • Integrating front-end access layers
  • Enabling GPU and accelerator scheduling
  • Accounting, usage tracking, and policy enforcement
  • Running agentic workloads as first-class Slurm jobs

Leave a Reply

Your email address will not be published. Required fields are marked *