Deploy a cold recoverable web server with persistent disk snapshots

Last reviewed 2021-08-04 UTC

The document describes how to deploy a cold failover topology for a web server by using a managed instance group and persistent disk snapshots. This document is intended for architects and people who work in operations and administrative teams.

You create a managed instance group that runs a single VM with a persistent disk that stores data. Scheduled snapshots of the persistent disk minimize data loss in a failover scenario. An external Application Load Balancer directs users to the VM that runs in the managed instance group, as shown in the following diagram:

An external Application Load Balancer directs users to a single VM that runs in a managed instance group, and a resource policy creates regular snapshots of a persistent disk that's attached to the VM.

If there's an instance failure, the managed instance group tries to recreate the VM in the same zone. If the failure is at the zone level, Cloud Monitoring or similar can let you know there's a problem and you manually create another managed instance group in another zone or region. In either failover scenario, the platform uses the latest persistent disk snapshot to create a replacement disk and attach it to the new VM in the instance group.

In this document, you use the external IP address of the VM or the load balancer to view a basic page on the web server. This approach lets you test the cold failover pattern if you don't have a registered domain name, and without any DNS changes. In a production environment, create and configure a Cloud DNS zone and record that resolves to the external IP address assigned to the load balancer.

This pattern balances the cost difference of running multiple VMs or regional persistent disks with maintaining a certain level of data protection. Your costs are lower as you run one VM and persistent disk, but there's a risk of data loss as the persistent disk snapshots are only taken at a set interval. To reduce your potential data loss, consider deploying a cold recoverable web server that uses regional persistent disks instead.

The following table outlines some high-level differences in data protection options for cold recoverable approaches that use regional persistent disks or persistent disk snapshots. For more information, see High availability options using persistent disks.

Regional persistent disks Persistent disk snapshots
Data loss - recovery point objective (RPO) Zero for a single failure, such as sustained outage in a zone or network disconnect. Any data since the last snapshot was taken, which is typically one hour or more.

The potential data loss depends on your snapshot schedule that controls how frequently snapshots are taken.
Recovery time objective (RTO) Deployment time for a new VM, plus several seconds for the regional persistent disk to be reattached. Deployment time for a new VM, plus time to create a new persistent disk from the latest snapshot.

The disk create time depends on the size of the snapshot, and could take tens of minutes or hours.
Cost Storage costs double as the regional persistent disk is replicated continuously to another zone. You only pay for the amount of snapshot space consumed.
For more information, see Disks and images pricing.

Objectives

  • Create a managed instance group to run a VM with a persistent disk.
  • Configure a snapshot schedule to take regular snapshots of the persistent disk.
  • Create an instance template and startup script.
  • Create and configure an external Application Load Balancer.
  • Test the cold web server failover with a replacement managed instance group.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Compute Engine API.

    Enable the API

  5. Install the Google Cloud CLI.
  6. To initialize the gcloud CLI, run the following command:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  8. Make sure that billing is enabled for your Google Cloud project.

  9. Enable the Compute Engine API.

    Enable the API

  10. Install the Google Cloud CLI.
  11. To initialize the gcloud CLI, run the following command:

    gcloud init
  12. You can run the Google Cloud CLI in the Google Cloud console without installing the Google Cloud CLI. To run the gcloud CLI in the Google Cloud console, use Cloud Shell.

Prepare the environment

In this section, you define some variables for your resource names and locations. These variables are used by the Google Cloud CLI commands as you deploy the resources.

Throughout this document, unless otherwise noted, you enter all commands in Cloud Shell or your local development environment.

  1. Replace PROJECT_ID with your own project ID. If required, provide your own name suffix for resources, such as app.

    Specify a region, such as us-central1, and two zones within that region, such as us-central1-a and us-central1-f. These zones define where the initial persistent disk and managed instance group is deployed and where you can manually fail over to if needed.

    PROJECT_ID=PROJECT_ID
    NAME_SUFFIX=app
    REGION=us-central1
    ZONE1=us-central1-a
    ZONE2=us-central1-f
    

Create a VPC and subnet

To provide network access to the VMs, create a Virtual Private Cloud (VPC) and subnet. As the managed instance group works across zones within a single region, only one subnet is created. For more information on the advantages of the custom subnet mode to manage IP address ranges in use in your environment, see Use custom mode VPC networks.

  1. Create the VPC with a custom subnet mode:

    gcloud compute networks create network-$NAME_SUFFIX \
        --subnet-mode=custom
    

    If you see a Cloud Shell prompt, authorize this first request to make API calls.

  2. Create a subnet in the new VPC. Define your own address range, such as 10.1.0.0/20, that fits in your network range:

    gcloud compute networks subnets create subnet-$NAME_SUFFIX-$REGION \
        --network=network-$NAME_SUFFIX \
        --range=10.1.0.0/20 \
        --region=$REGION
    

Create firewall rules

  1. Create firewall rules to allow web traffic and health checks for the load balancer and managed instance groups:

    gcloud compute firewall-rules create allow-http-$NAME_SUFFIX \
        --network=network-$NAME_SUFFIX \
        --direction=INGRESS \
        --priority=1000 \
        --action=ALLOW \
        --rules=tcp:80 \
        --source-ranges=0.0.0.0/0 \
        --target-tags=http-server
    
    gcloud compute firewall-rules create allow-health-check-$NAME_SUFFIX \
        --network=network-$NAME_SUFFIX \
        --action=allow \
        --direction=ingress \
        --source-ranges=130.211.0.0/22,35.191.0.0/16 \
        --target-tags=allow-health-check \
        --rules=tcp:80
    

    The HTTP rule allows traffic to any VM where the http-server tag is applied, and from any source using the 0.0.0.0/0 range. For the health check rule, default ranges for Google Cloud are set to allow the platform to correctly check the health of resources.

  2. To allow SSH traffic for the initial configuration of a base VM image, scope the firewall rule to your environment using the --source-range parameter. You might need to work with your network team to determine what source ranges your organization uses.

    Replace IP_ADDRESS_SCOPE with your own IP address scopes:

    gcloud compute firewall-rules create allow-ssh-$NAME_SUFFIX \
        --network=network-$NAME_SUFFIX \
        --direction=INGRESS \
        --priority=1000 \
        --action=ALLOW \
        --rules=tcp:22 \
        --source-ranges=IP_ADDRESS_SCOPE
    
  3. After you create the firewall rules, verify that the three rules have been added:

    gcloud compute firewall-rules list \
        --project=$PROJECT_ID \
        --filter="NETWORK=network-$NAME_SUFFIX"
    

    The following example output shows the three rules have been correctly created:

    NAME                    NETWORK      DIRECTION  PRIORITY  ALLOW
    allow-health-check-app  network-app  INGRESS    1000      tcp:80
    allow-http-app          network-app  INGRESS    1000      tcp:80
    allow-ssh-app           network-app  INGRESS    1000      tcp:22
    

Create and configure a base VM image

To create identical VMs that you deploy without additional configuration, you use a custom VM image. This image captures the OS and Apache configuration, and is used to create each VM in the managed instance group in the next steps.

You use a persistent disk to store the application data. In this document, you use a basic Apache website to serve the application. Later in this document, you create a snapshot schedule that's attached to this persistent disk to create automated disk snapshots.

On the VM, you create a basic index.html file on the persistent disk and mount it to /var/www/example.com. An Apache configuration file at /etc/apache2/sites-available/example.com.conf serves web content from the mounted persistent disk location.

The following diagram shows the basic HTML page served by Apache that's stored on the persistent disk:

The VM has a basic HTML page stored on the persistent disk with an Apache configuration file to load from the mounted disk location.

You build this environment in the following steps.

  1. Create a 10 GiB SSD. Understand your storage needs and the associated costs of paying for the provisioned space, not consumed space. For more information, see persistent disk pricing.

    gcloud compute disks create disk-$NAME_SUFFIX \
        --zone $ZONE1 \
        --size=10 \
        --type=pd-ssd
    
  2. Create a base VM with the attached persistent disk:

    gcloud compute instances create vm-base-$NAME_SUFFIX \
        --zone=$ZONE1 \
        --machine-type=n1-standard-1 \
        --subnet=subnet-$NAME_SUFFIX-$REGION \
        --tags=http-server \
        --image=debian-10-buster-v20210721 \
        --image-project=debian-cloud \
        --boot-disk-size=10GB \
        --boot-disk-type=pd-balanced \
        --boot-disk-device-name=vm-base-$NAME_SUFFIX \
        --disk=mode=rw,name=disk-$NAME_SUFFIX,device-name=disk-$NAME_SUFFIX
    

    You use parameters defined at the start of this document to name the VM and connect to the correct subnet. Names are also assigned from the parameters for the boot disk and data disk.

  3. To install and configure the simple website, first connect to the base VM using SSH:

    gcloud compute ssh vm-base-$NAME_SUFFIX --zone=$ZONE1
    
  4. In your SSH session to the VM, create a script to configure the VM in an editor of your choice. The following example uses Nano as the editor:

    nano configure-vm.sh
    

    Paste the following configuration script into the file. Update the NAME_SUFFIX variable to match the value set at the start of this document, such as app:

    #!/bin/bash
    
    NAME_SUFFIX=app
    
    # Create directory for the basic website files
    sudo mkdir -p /var/www/example.com
    sudo chmod a+w /var/www/example.com
    sudo chown -R www-data: /var/www/example.com
    
    # Find the disk name, then format and mount it
    DISK_NAME="google-disk-$NAME_SUFFIX"
    DISK_PATH="$(find /dev/disk/by-id -name "${DISK_NAME}" | xargs -I '{}' readlink -f '{}')"
    
    sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard $DISK_PATH
    sudo mount -o discard,defaults $DISK_PATH /var/www/example.com
    
    # Install Apache, additional utilities, and cloud-init
    sudo apt-get update && sudo apt-get -y install apache2 moreutils cloud-init
    
    # Write out a basic HTML file to the mounted persistent disk
    sudo tee -a /var/www/example.com/index.html >/dev/null <<'EOF'
    <!doctype html>
    <html lang=en>
    <head>
    <meta charset=utf-8>
        <title>HA / DR example</title>
    </head>
    <body>
        <p>Welcome to a test web server with persistent disk snapshots!</p>
    </body>
    </html>
    EOF
    
    # Write out an Apache configuration file
    sudo tee -a /etc/apache2/sites-available/example.com.conf >/dev/null <<'EOF'
    <VirtualHost *:80>
            ServerName www.example.com
    
            ServerAdmin webmaster@localhost
            DocumentRoot /var/www/example.com
    
            ErrorLog ${APACHE_LOG_DIR}/error.log
            CustomLog ${APACHE_LOG_DIR}/access.log combined
    </VirtualHost>
    EOF
    
    # Enable the Apache configuration file and reload service
    sudo a2dissite 000-default
    sudo a2ensite example.com.conf
    sudo systemctl reload apache2
    
  5. Write out the file and exit your editor. For example, in Nano you use Ctrl-O to write out the file, then exit with Ctrl-X.

  6. Make the configuration script executable, then run it:

    chmod +x configure-vm.sh
    ./configure-vm.sh
    
  7. If there's an instance failure and the managed instance group needs to create a replacement from this base VM, the application data must be available. The following steps should automatically run on each new VM:

    • Get some info from the metadata server.
    • Get the latest snapshot for the persistent disk.
    • Create a disk from this latest snapshot.
    • Attach the new disk to the VM.
    • Mount the disk within the VM.

    Create a startup script named app-startup.sh that performs these steps required for the VM. This startup script is applied to an instance template in a following step.

    sudo mkdir /opt/cloud-init-scripts
    
    sudo tee -a /opt/cloud-init-scripts/app-startup.sh >/dev/null <<'EOF'
    #!/bin/bash
    
    # Install jq and get an access token for API requests
    apt-get install -y jq
    OAUTH_TOKEN=$(curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" \
        -H "Metadata-Flavor: Google" --silent | jq -r '.access_token')
    
    # Make a request against the metadata server to determine the project name,
    # instance name, and what zone it's running in
    ZONE_INFO=$(curl http://metadata.google.internal/computeMetadata/v1/instance/zone \
        -H "Metadata-Flavor: Google" --silent)
    PROJECT_NAME=$(curl http://metadata.google.internal/computeMetadata/v1/instance/zone \
        -H "Metadata-Flavor: Google" --silent | awk -v FS="/" '{print $2}')
    ZONE_NAME=$(curl http://metadata.google.internal/computeMetadata/v1/instance/zone \
        -H "Metadata-Flavor: Google" --silent | sed 's:.*/::')
    INSTANCE_NAME=$(curl http://metadata.google.internal/computeMetadata/v1/instance/name \
        -H "Metadata-Flavor: Google" --silent)
    
    # Get the latest snapshot of the app disk
    LATEST_SNAPSHOT=$(curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" \
        https://compute.googleapis.com/compute/v1/projects/$PROJECT_NAME/global/snapshots \
        --silent | jq -s '.[].items[] | select(.name | contains("disk-$NAME")) | .name' \
        | sort -r | head -n 1 | tr -d '"')
    
    # Create a persistent disk using the latest persistent disk snapshot
    curl -X POST -H "Authorization: Bearer $OAUTH_TOKEN" -H "Content-Type: application/json; charset=utf-8" \
        https://compute.googleapis.com/compute/v1/$ZONE_INFO/disks \
        --data '{"name":"'$LATEST_SNAPSHOT'-restored","sizeGb":"10","type":"zones/'$ZONE_NAME'/diskTypes/pd-ssd","sourceSnapshot":"https://www.googleapis.com/compute/v1/projects/'$PROJECT_NAME'/global/snapshots/'$LATEST_SNAPSHOT'"}'
    
    # Wait for the persistent disk to be created from the disk snapshot
    DISK_STATUS=$(curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" \
        https://compute.googleapis.com/compute/v1/projects/$PROJECT_NAME/zones/$ZONE_NAME/disks/$LATEST_SNAPSHOT-restored \
        --silent | jq -r .status)
    
    while [ $DISK_STATUS != "READY" ]
    do
        sleep 2
        DISK_STATUS=$(curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" \
            https://compute.googleapis.com/compute/v1/projects/$PROJECT_NAME/zones/$ZONE_NAME/disks/$LATEST_SNAPSHOT-restored \
            --silent | jq -r .status)
    done
    
    # Attach the new persistent disk created from the snapshot to the VM
    curl -X POST -H "Authorization: Bearer $OAUTH_TOKEN" -H "Content-Type: application/json; charset=utf-8" \
        https://compute.googleapis.com/compute/v1/$ZONE_INFO/instances/$INSTANCE_NAME/attachDisk \
        --data '{ "source": "/compute/v1/'$ZONE_INFO'/disks/'$LATEST_SNAPSHOT'-restored"}'
    
    # Wait for the persistent disk to be attached before mounting
    ATTACH_STATUS=$(curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" \
        https://compute.googleapis.com/compute/v1/projects/$PROJECT_NAME/zones/$ZONE_NAME/instances/$INSTANCE_NAME \
        --silent | jq '.disks[] | select(.source | contains("disk-"))')
    
    while [ -z "$ATTACH_STATUS" ]
    do
        sleep 2
        ATTACH_STATUS=$(curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" GET \
            https://compute.googleapis.com/compute/v1/projects/$PROJECT_NAME/zones/$ZONE_NAME/instances/$INSTANCE_NAME \
            --silent | jq '.disks[] | select(.source | contains("disk-"))')
    done
    
    # With the disk attached, mount the disk and restart Apache
    echo UUID=`blkid -s UUID -o value /dev/sdb` /var/www/example.com ext4 discard,defaults,nofail 0 2 \
        | tee -a /etc/fstab
    mount -a
    systemctl reload apache2
    
    # Remove jq so it's not left on the VM
    apt-get remove -y jq
    EOF
    
  8. To apply the NAME_SUFFIX variable you defined at the start of the document into the startup script, such as app, use the envsubst command:

    export NAME=app
    envsubst '$NAME' < "/opt/cloud-init-scripts/app-startup.sh" \
        | sudo sponge "/opt/cloud-init-scripts/app-startup.sh"
    
  9. Exit the SSH session to the VM:

    exit
    
  10. Get the IP address of the VM and use curl to see the basic web page:

    curl $(gcloud compute instances describe vm-base-$NAME_SUFFIX \
        --zone $ZONE1 \
        --format="value(networkInterfaces.accessConfigs.[0].natIP)")
    

    The basic website is returned, as shown in the following example output:

    <!doctype html>
    
    <html lang=en>
    <head>
    <meta charset=utf-8>
        <title>HA / DR example</title>
    </head>
    <body>
        <p>Welcome to a test web server with persistent disk snapshots!</p>
    </body>
    </html>
    

    This step confirms that Apache is configured correctly, and the page is loaded from the attached persistent disk. In the following sections, you create an image using this base VM and configure an instance template with a startup script.

Create a persistent disk snapshot schedule

To make sure that VMs created in the managed instance group always have the latest data from the persistent disk, you create a snapshot schedule. This schedule takes automatic snapshots of a persistent disk at defined times, and controls how long to retain the snapshots. The following image shows how this snapshot process works:

A resource policy is created that defines a snapshot schedule and is attached to the persistent disk to take regular snapshots.

Think about your application needs and business goals in how often you should take snapshots - for example, a static website needs less frequent snapshots than an active application writing data to disk.

For more information on how to determine the best approach for your own applications and which recovery method to use, see the disaster recovery planning guide.

  1. In this scenario, you use a snapshot schedule to create regular persistent disk snapshots. You define this snapshot schedule in a resource policy. Resource policies let you define actions to run, and attach them to resources in your environment.

    In this resource policy, you define a schedule to create a snapshot with the following settin