Final Validation with Slurm Job
Finalize Integration with Slurm
Edit
resume_xspot.sh
and add ased
command for every pool:- CODE
user_data=$(cat /opt/slurm/etc/xvm16-_user_data | base64 -w 0)
becomes:
- CODE
user_data=$(cat /opt/slurm/etc/exostellar/xvm16-_user_data | sed "s/XSPOT_NODENAME/$host/g" | base64 -w 0)
N.B.: The
cat
command works against auser_data
script in the${SLURM_CONF_DIR}/exostellar
directory.
Edit your
slurm.conf
and add include statement to pick upxspot.slurm.conf
. Replace${SLURM_CONF_DIR}
with the path to the Slurm configuration directory:- CODE
include ${SLURM_CONF_DIR}/exostellar/xspot.slurm.conf
Verify the
xpot.slurm.conf
file’sResumeProgram
andSuspendProgram
point correctly at${SLURM_CONF_DIR}/exostellar/resume_xspot.py
and${SLURM_CONF_DIR}/exostellar/suspend_xspot.py
.Introducing new nodes into a slurm cluster requires restart of the slurm control deamon:
- CODE
systemctl restart slurmctld
Integration steps are complete and a job submission to the new partition is the last validation:
As a cluster user, navigate to a valid job submission directory and launch a job as normal, but be sure to specify the new partition:
sbatch -p NewPartitionName < job-script.sh