Connect from Compute Engine: multiple clients

The Compute Engine service supports the creation of multiple VMs with the gcloud compute instances bulk create command.

The following sections describe the process of creating a startup script and deploying it to any number of Compute Engine VMs.

For detailed instructions on creating and connecting to a single VM, see Connect from Compute Engine: single client.

Required permissions

You must have the following IAM role in order to create a Compute Engine VM:

  • Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1). For more information, refer to the [Compute Engine documentation][gce-role].

Set environment variables

The following environment variables are used in the example commands in this document:

export SSH_USER="daos-user"
export CLIENT_PREFIX="daos-client-vm"
export NUM_CLIENTS=10

Update these to your desired values.

Create an SSH key

Create an SSH key and save it locally to be distributed to the client VMs. The key is associated with the SSH user specified in the environment variables, and is created on each VM:

# Generate an SSH key for the specified user
ssh-keygen -t rsa -b 4096 -C "${SSH_USER}" -N '' -f "./id_rsa"
chmod 600 "./id_rsa"

# Create a new file in the format [user]:[public key] user
echo "${SSH_USER}:$(cat "./id_rsa.pub") ${SSH_USER}" > "./keys.txt"

Get Parallelstore network details

Get the Parallelstore server IP addresses in a format consumable by the daos agent:

export ACCESS_POINTS=$(gcloud beta parallelstore instances describe INSTANCE_NAME \
  --location LOCATION \
  --format "value[delimiter=', '](format("{0}", accessPoints))")

Get the network name associated with the Parallelstore instance:

export NETWORK=$(gcloud beta parallelstore instances describe INSTANCE_NAME \
  --location LOCATION \
  --format "value[delimiter=', '](format("{0}", network))") | awk -F '/' '{print $NF}' 

Create the startup script

The startup script is attached to the VM and is run every time the system starts. The startup script does the following:

  • Configures the daos agent
  • Installs required libraries
  • Mounts your Parallelstore instance to /tmp/parallelstore/ on each VM

The following script works on VMs running HPC Rocky 8.

# Create a startup script that configures the VM
cat > ./startup-script << EOF
sudo tee /etc/yum.repos.d/parallelstore-v2-6-el8.repo << INNEREOF
[parallelstore-v2-6-el8]
name=Parallelstore EL8 v2.6
baseurl=https://us-central1-yum.pkg.dev/projects/parallelstore-packages/v2-6-el8
enabled=1
repo_gpgcheck=0
gpgcheck=0
INNEREOF
sudo dnf makecache

# Install daos-client
dnf install -y epel-release # needed for capstone
dnf install -y daos-client

# Upgrade libfabric
dnf upgrade -y libfabric

systemctl stop daos_agent

mkdir -p /etc/daos
cat > /etc/daos/daos_agent.yml << INNEREOF
access_points: ${ACCESS_POINTS}

transport_config:
  allow_insecure: true

fabric_ifaces:
- numa_node: 0
  devices:
  - iface: eth0
    domain: eth0
INNEREOF

echo -e "Host *\n\tStrictHostKeyChecking no\n\tUserKnownHostsFile /dev/null" > /home/${SSH_USER}/.ssh/config
chmod 600 /home/${SSH_USER}/.ssh/config

usermod -u 2000 ${SSH_USER}
groupmod -g 2000 ${SSH_USER}
chown -R ${SSH_USER}:${SSH_USER} /home/${SSH_USER}

chown -R daos_agent:daos_agent /etc/daos/

systemctl enable daos_agent
systemctl start daos_agent

mkdir -p /tmp/parallelstore
dfuse -m /tmp/parallelstore --pool default-pool --container default-container --disable-wb-cache --thread-count=16 --eq-count=8 --multi-user
chmod 777 /tmp/parallelstore

EOF

Create the client VMs

The overall performance of your workloads depends on the client machine types. The following example uses c2-standard-30 VMs; modify the machine-type value to increase performance with faster NICs. See Machine families resource and comparison guide for details of the available machine types.

To create VM instances in bulk, use the gcloud compute instances bulk create command:

gcloud compute instances bulk create \
  --name-pattern="${CLIENT_PREFIX}-####" \
  --zone="LOCATION" \
  --machine-type="c2-standard-30" \
  --network-interface=subnet=${NETWORK},nic-type=GVNIC \
  --network-performance-configs=total-egress-bandwidth-tier=TIER_1 \
  --create-disk=auto-delete=yes,boot=yes,device-name=client-vm1,image=projects/cloud-hpc-image-public/global/images/hpc-rocky-linux-8-v20240126,mode=rw,size=100,type=pd-balanced \
  --metadata=enable-oslogin=FALSE \
  --metadata-from-file=ssh-keys=./keys.txt,startup-script=./startup-script \
  --count ${NUM_CLIENTS}

What's next