Final Validation with Slurm Job
Finalize Integration with Slurm
Edit
resume_xspot.shand add asedcommand for every pool:- CODE
user_data=$(cat /opt/slurm/etc/xvm16-_user_data | base64 -w 0) becomes:
- CODE
user_data=$(cat /opt/slurm/etc/exostellar/xvm16-_user_data | sed "s/XSPOT_NODENAME/$host/g" | base64 -w 0)N.B.: The
catcommand works against auser_datascript in the${SLURM_CONF_DIR}/exostellardirectory.
Edit your
slurm.confand add include statement to pick upxspot.slurm.conf. Replace${SLURM_CONF_DIR}with the path to the Slurm configuration directory:- CODE
include ${SLURM_CONF_DIR}/exostellar/xspot.slurm.conf
Verify the
xpot.slurm.conffile’sResumeProgramandSuspendProgrampoint correctly at${SLURM_CONF_DIR}/exostellar/resume_xspot.pyand${SLURM_CONF_DIR}/exostellar/suspend_xspot.py.Introducing new nodes into a slurm cluster requires restart of the slurm control deamon:
- CODE
systemctl restart slurmctld
Integration steps are complete and a job submission to the new partition is the last validation:
As a cluster user, navigate to a valid job submission directory and launch a job as normal, but be sure to specify the new partition:
sbatch -p NewPartitionName < job-script.sh