Skip to main content
Skip table of contents

Validating Migratable VM and Slurm Communications


Validation of Migratable VM Joined to Your Slurm Cluster

These steps allow for a quick validation that the new compute resources can successfully connect and register with the scheduler.

CODE
xli node add -c 4 -m 8192 -n <HostName> -i <ImageName> -p <PoolName> -r <ProfileName> -u ./user_data.sh
  1. The Image Name specified with -i <IMAGE_NAME> should correspond to the Image Name added to the EMS’s Image Library earlier.

  2. The -u user_data.sh is available for any customization that may be required: temporarily changing a password to faciliate logging in, for example.

  3. This step is meant to provide a migratable VM so that sanity checking may occur:

    1. Have network mounts appeared as expected?

    2. Is authentication working as intended?

    3. What commands are required to finish bootstrapping?

    4. Et cetera.

  4. Lastly, slurmd should be started at the end of bootstrapping.

    1. Output from starting slurmd willl likely show an error because the arbitrary host is unknown to the scheduler:

    2. CODE
      /opt/slurm/sbin/slurmd -N <HostName> -f /opt/slurm/etc/slurm.conf
  5. To remove this temporary VM:

    1. CODE
      xli node rm -n <HostName>
  6. The above steps may need to be iterated through several times. When totally satisfied, stash the various commands required for successful bootstrapping and overwrite the user data scripts in the exostellar directory.

    1. There will be a per-pool user_data script in the slurm.tgz whose assets were placed in ${SLURM_CONF_DIR}/exostellar. It can be overwritten at any time a change is needed and the next time a node is instantiated from that pool, the node will get the changes.

    2. A common scenario is that all the user_data scripts are identical, but it could be beneficial for different pools to have different user_data bootstrapping assets.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.