Installing with CloudFormation Template (AWS)
This is an installation guide for deploying the Exostellar Management Server using the provided AWS CloudFormation template. Intended for DevOps, System Administrators, and Cloud Engineers, it outlines the process from prerequisites to accessing the deployed instance.
If you are installing using the CFT, please subscribe to the CFT product listing on the AWS Marketplace to proceed.
Prerequisites
Before you begin, use this checklist to confirm that your tools and environment satisfy the specifications required.
Tools:
Network
Component | Requirements |
VPC |
|
NAT Gateway |
|
Use the following commands to ensure that the AWS VPC where the product will run has at least one private subnet with public NAT Gateways.
Check private subnets that are suitable for running the Exostellar Workers:
aws ec2 describe-subnets --filters "Name=vpc-id,Values=<vpc_id>" --query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId'
Check whether there is a public NAT Gateway attached:
aws ec2 describe-nat-gateways --filter Name=vpc-id,Values=<vpc_id> --output json | jq '.NatGateways[] | {NatGatewayId, SubnetId, ConnectivityType}'
If no private subnets exist, follow the AWS documentation to create a private subnet and a public NAT Gateway.
EKS users: If you already have an EKS Cluster set up, please install the Exostellar Management Server into the same VPC as your EKS Cluster and ensure that the VPC meets the above requirements.
Security
Component | Details |
SSH Key |
|
Trusted Certificate |
|
A pre-provisioned, user-managed SSH key pair is required to access the Exostellar Management Server.
Use the following command to create a new SSH key pair:
aws ec2 create-key-pair --key-name 'my-dev-key' --query 'KeyMaterial' --output text --region us-east-2 > my-dev-key.pem
Modify the permission to secure the key:
chmod 400 my-dev-key.pem
For environments with existing PKI setup, the x509 certificates, private key, and optionally, intermediate chain certificates and CA certificates will also be needed.
Compute
Component | Requirements |
Operating System |
|
Permissions
The following file contains the minimum IAM permissions required by the AWS IAM principal used to install the product: exostellar-user-iam.json
To manage and scale your workloads efficiently, the Exostellar Controllers and Workers require a set of IAM permissions. Use this CloudFormation template to create the necessary permissions: exostellar-controller-worker-iams.yaml
When completed, the roles and instance profile ARNs outputs by CloudFormation will be needed for subsequent installation steps.
Installation Steps
Step 1: Preparation
Please verify your environment configurations align with the prerequisites outlined above.
Step 2: Navigate to AWS CloudFormation
Log in to the AWS Management Console and select the CloudFormation service.
Step 3: Create a New Stack
Select Create stack > With new resources (standard).
Select Choose an existing template.
Choose to use the Amazon S3 URL, then copy the URL below.
Proceed by clicking Next.
Step 4: Specify Stack Details
Stack Name: Assign a unique name to your CloudFormation stack, e.g.,
IOManagementServer
.
Network Configurations
VPC ID: Choose the VPC where the Management Server will be deployed.
Subnet ID: Select the appropriate subnet for deployment.
Is Subnet Public? Indicate
true
for public subnets.VPC CIDR: Enter the CIDR block for the selected VPC.
Shared Security Group ID: Enter the Security Group ID for your EKS or Slurm cluster, or leave it blank.
Instance Configurations
EC2 Instance Type: Select a suitable instance type for the Management Server. The default is
m5d.xlarge
.SSH Key Pair: Select an existing SSH key pair. This is used to access the Management Server.
TerminationProtection: We recommend enabling termination protection for your Management Server to prevent accidental deletions or stops.
Volume Size: Define the volume for the Management Server. The default is set to 100 GB, which you can adjust based on your needs.
HA NFS Integration
By default, the X-IO NFS file system is run on the Management Server. To use a standalone remote NFS file share instance such as AWS EFS, enter the NFS DNS name and security group ID:
NFS DNS name: DNS name of the remote file shares instance. E.g.,
fs-123456789.efs.us-east-1.amazonaws.com
. If left empty, the default NFS file system is used.NFS security group ID: ID of the security group to enable traffic between X-IO and the remote file shares.
On EFS, only the default EFS file system policy is supported. See the AWS documentation for more information on this file system policy.
Click Next to continue.
Step 5: Configure Options
This step is optional. Configure any additional stack options such as tags, permissions, and other settings relevant to your organization. Click Next to advance.
Step 6: Review and Deploy
Carefully review all configurations to ensure accuracy.
Acknowledge the creation of IAM resources by ticking the relevant checkbox.
Initiate the deployment by clicking Create stack.
Step 7: Access the Management Server
Upon successful deployment, navigate to the Outputs section of your CloudFormation stack to retrieve access details:
ExostellarMgmtServerURL: Access URL for the Management Server.
ExostellarMgmtServerPrivateIP: Private IP of the Management Server.
ExostellarAdminUsername: The initial admin username for the Management Server.
ExostellarOptimizerAdminPassword: The initial admin password. Make sure to change this upon your first login for security.
Verification
After successful deployment, log in to the Management Server with the provided credentials. You will be prompted to change the admin password. It's also recommended to explore the platform and configure your infrastructure optimization settings as per your organization's requirements.
How can I verify that my environment is correctly configured and the installation is successful without connecting to a cluster?
Step 1 - Navigate to the Resources tab of the CloudFormation page and go to ExostellarInstance
Step 2 - SSH into the Management Server EC2 instance and start a Hello World VM on the Amazon EC2 Spot Instance!
cp /xcompute/slurm/bin/xcompute-daemon/host-cluster/test_createVm.sh ~/test_createVm.sh && \
sed -i "s/^XCOMPUTE_HEAD_IP.*/XCOMPUTE_HEAD_IP=127.0.0.1/g" ~/test_createVm.sh && \
cat > ~/user_data.sh <<- EOX
#!/bin/bash
echo "$(cat ~/.ssh/id_rsa.pub)" >> /root/.ssh/authorized_keys
EOX
cd ~ && ./test_createVm.sh -i ubuntu -h hello_world_ubuntu -u ./user_data.sh
If you see a message similar to this, congratulations! You have successfully configured and installed everything correctly!
NodeName: hello_world_ubuntutu... 340s
Controller: az1-yy7jyuv5-1
Controller IP: 192.0.12.xx
Vm IP: 192.0.6.xxx
########## done ##########
Clean up - Shut down the Hello World VM, and the instances will be automatically terminated:
curl -v -X DELETE http://localhost:5000/v1/xcompute/vm/hello_world_ubuntu