Adding or Modifying Pools Slurm
It’s common to refine or otherwise modify configurations over time. Soon there will be a CLI tool to obviate the need for the following manual steps, which are the current steps required for reconfiguration after the initial integration has been pushed into production.
- Navigate to the - exostellardirectory where the configuration assets reside:- CODEcd ${SLURM_CONF_DIR}/exostellar
 
- Setup a timestamp folder in case there’s a need to rollback: - CODEPREVIOUS_DIR=$( date +%Y-%m-%d_%H-%M-%S ) mkdir ${PREVIOUS_DIR}
 
- Place the contents of the - exostellardirectory in the timestamp directory:- CODEmv * ${PREVIOUS_DIR}
 
- Make a new - jsonfolder and copy the- env.jsonand- profile.json:- CODEmkdir json cp ${PREVIOUS_DIR}/json/env0.json ${PREVIOUS_DIR}/json/profile0.json json/ cd json mv env0.json env1.json mv profile0.json profile1.json
 
- Edit - env1.jsonas needed, e.g.:- Add more pools if you need more CPU-core or Memory options availble in the partition. 
- Increase the node count in pools. 
- Environment Configuration Information for reference. 
 
- Likely, - profile0.jsonmay not need any modification.- Profile Configuration Information for reference. 
 
- Validate the JSON asset with - jq:- CODEjq . env1.json
 
- You will see well-formatted JSON if - jqcan read the file, indicating no errors. If you see an error message, that means the JSON is not valid.
- When the JSON is valid, the file can be pushed to the MGMT_SERVER: - CODEcurl -d "@env1.json" -H 'Content-Type: application/json' -X PUT http://${MGMT_SERVER_IP}:5000/v1/env
 
- If the profile was changed, validate it with the quick - jqtest.- CODEjq . profile1.json
 
- Push the changes live: 
- CODEcurl -d "@profile1.json" -H 'Content-Type: application/json' -X PUT http://${MGMT_SERVER_IP}:5000/v1/profile
- Grab the assets from the MGMT_SERVER: 
- CODEcurl -X GET http://${MGMT_SERVER_IP}:5000/v1/xcompute/download/slurm -o slurm.tgz- If the EnvName was changed (above in Edit the Slurm Environment JSON for Your Purposes - Step 2 ), then the following command can be used with your - CustomEnvironmentName:
- CODEcurl -X GET http://${MGMT_SERVER_IP}:5000/v1/xcompute/download/slurm?envName=CustomEnvironmentName -o slurm.tgz
 
- Unpack them into the - exostellarfolder:
- CODEtar xf slurm.tgz -C ../ cd .. mv assets/* . rm assets
- Edit - resume_xspot.shand add- sedcommand snippet (- | sed "s/XSPOT_NODENAME/$host/g") for every pool:- CODEuser_data=$(cat /opt/slurm/etc/xvm16-_user_data | base64 -w 0)
- becomes: 
- CODEuser_data=$(cat /opt/slurm/etc/exostellar/xvm16-_user_data | sed "s/XSPOT_NODENAME/$host/g" | base64 -w 0)
 
- Introducing new nodes into a slurm cluster requires restart of the slurm control deamon: - CODEsystemctl restart slurmctld
 
- Integration steps are complete and a job submission to the new partition is the last validation: - As a user, navigate to a valid job submission directory and launch a job as normal, but be sure to specifiy the new partition: - sbatch -p NewPartitionName < job-script.sh
 
 
