To make basic configuration changes to a running cluster, it is recommended to edit and redeploy the blueprint. This method of reconfiguring a live cluster must only be used for the following cases:
- To add or remove a partition from the cluster
- To resize an existing partition
Set up environment to allow for reconfiguration
To setup the environment to allow for reconfiguration, complete the following steps:
Set top level variables to allow for cluster reconfiguration. Placing these settings in the
vars
block ensures that they are applied to any module that accepts them as inputs.# Slurm v6 vars: ... enable_cleanup_compute: true # Slurm v5 vars: ... enable_reconfigure: true enable_cleanup_compute: true enable_cleanup_subscriptions: true
To use the settings in the previous step, install local python dependencies. Python dependencies must be installed on the deployment machine where the
ghpc
command is run from. For install instructions, review the following:- For Slurm v5, review the schedmd-slurm-gcp-v5-controller description.
- For Slurm v6, review the schedmd-slurm-gcp-v6-controller description.
Reconfigure partitions on running cluster
To reconfigure a running cluster, complete the following steps:
- To enable cluster reconfiguration, ensure that you Set up environment to allow for reconfiguration.
Ensure that redeployment happens with the same version of
ghpc
as the original deployment. Graceful redeployment across versions of the Cluster Toolkit isn't guaranteed.You can check the version of ghpc by using the
ghpc --version
command. Also, ghpc prints a warning if you are using a different version on the redeploy.Redeploy the blueprint as follows:
- Edit the blueprint file. For example, you can increase
the
node_count_static
on a node set. Recreate the deployment by running the following command. The
-w
flag is required for the previous deployment to be overwritten.ghpc create BLUEPRINT_NAME -w
Redeploy the deployment by running the following command:
ghpc deploy DEPLOYMENT_FOLDER_NAME
Carefully evaluate the terraform plan to make sure that no unexpected resources are replaced or deleted.
- Edit the blueprint file. For example, you can increase
the
What's next
- Learn how to Manage static compute node.