Validating Migratable VM and Slurm Communications
Validation of Migratable VM Joined to Your Slurm Cluster
The script test_createVm.sh exists for a quick validation that new compute resources can successfully connect and register with the scheduler.
./test_createVm.sh -h xvm0 -i <IMAGE_NAME> -u user_data.sh
The hostname specified with
-h xvm0is completely arbitrary.The Image Name specified with
-i <IMAGE_NAME>should correspond to the Image Name from theparse_helper.shcommand and the environment setup earlier.The
-u user_data.shis available for any customization that may be required: temporarily changing a password to faciliate logging in, for example.The
test_createVm.shscript will continuously output updates until the VM is created. When the VM is ready, the script will exit and you’ll see all the fields in the output are now filled with values:- CODE
Waiting for xvm0... (4) NodeName: xvm0 Controller: az1-qeuiptjx-1 Controller IP: 172.31.57.160 Vm IP: 172.31.48.108
This step is meant to provide a migratable VM so that sanity checking may occur:
Have network mounts appeared as expected?
Is authentication working as intended?
What commands are required to finish bootstrapping?
Et cetera.
Lastly,
slurmdshould be started at the end of bootstrapping.Output from starting
slurmdwilll likely show an error because the arbitrary host is unknown to the scheduler:- CODE
/opt/slurm/sbin/slurmd -N xvm0 -f /opt/slurm/etc/slurm.confBut that is not a problem since
xvm0has not been added to the cluster yet. That will happen in subsequent steps.
To remove this temporary VM:
Replace VM_NAME with the name of the VM ,
-h xvm0example above.- CODE
curl -X DELETE http://${MGMT_SERVER_IP}:5000/v1/xcompute/vm/VM_NAME
The above steps may need to be iterated through several times. When totally satisfied, stash the various commands required for successful bootstrapping and overwrite the user data scripts in the
exostellardirectory.There will be a per-pool
user_datascript in theslurm.tgzwhose assets were placed in${SLURM_CONF_DIR}/exostellar. It can be overwritten at any time a change is needed and the next time a node is instantiated from that pool, the node will get the changes.A common scenario is that all the
user_datascripts are identical, but it could be beneficial for different pools to have differentuser_databootstrapping assets.