Automating cost optimizations with Cloud Functions, Cloud Scheduler, and Cloud Monitoring


This document shows you how to use Cloud Functions to identify and clean up wasted cloud resources, schedule functions to run with Cloud Scheduler, and use Cloud Monitoring alerting policies to execute them based on observed usage. This document is intended for developers, SREs, cloud architects, and cloud infrastructure admins who are looking for a systematic and automated approach to identify and reduce wasteful cloud spending.

This document assumes that you're familiar with the following:

Objectives

  • Delete unused IP addresses: On Google Cloud, static IP addresses are a free resource when they're attached to a load balancer or virtual machine (VM) instance. When a static IP address is reserved, but not used, it accumulates an hourly charge. In apps that heavily depend on static IP addresses and large-scale dynamic provisioning, this waste can become significant over time.
  • Delete orphaned or unused persistent disks: Persistent disks are unused or orphaned if they're created without ever being attached to a VM, or if a machine has multiple disks and one or more disks are detached.
  • Migrate to a less expensive storage classes: Google Cloud offers multiple classes of object storage. Use the class that best fits your needs.

Architecture

The following diagram describes the first part of the deployment, where you schedule a Cloud Function to identify and clean up unused IP addresses.

Architecture of a Cloud Function that identifies and cleans up unused IP addresses.

The first example covers the following:

  • Creating a Compute Engine VM with a static external IP address and a separate unused static external IP address.
  • Deploying a Cloud Function to identify unused addresses.
  • Creating a Cloud Scheduler job to schedule the function to run by using an HTTP trigger.

In the following diagram, you schedule a Cloud Function to identify and clean up unattached and orphaned persistent disks.

Architecture of a Cloud Function that identifies and cleans up unused persistent disks.

The second example covers the following:

  • Creating a Compute Engine VM with two persistent disks and a separate unattached persistent disk. One of the disks is orphaned by being detached from the VM.
  • Deploying a Cloud Function to identify unattached and orphaned persistent disks.
  • Creating a Cloud Scheduler job to schedule the execution of the Cloud Function by using an HTTP trigger.

In the following diagram, you trigger a Cloud Function to migrate a storage bucket to a less expensive storage class from a Monitoring alerting policy.

Architecture of a Cloud Function that migrates a storage bucket.

The third example covers the following:

  • Creating two storage buckets, adding a file to the serving bucket, and generating traffic against it.
  • Creating a Monitoring dashboard to visualize bucket utilization.
  • Deploying a Cloud Function to migrate the idle bucket to a less expensive storage class.
  • Triggering the function by using a payload intended to simulate a notification received from a Monitoring alerting policy.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Compute Engine, Cloud Functions, and Cloud Storage APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Compute Engine, Cloud Functions, and Cloud Storage APIs.

    Enable the APIs

  8. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  9. You run all the commands in this document from Cloud Shell.

Setting up your environment

In this section, you configure the infrastructure and identities that are required for this architecture.

  1. In Cloud Shell, clone the repository and change to the gcf-automated-resource-cleanup directory:

    git clone https://github.com/GoogleCloudPlatform/gcf-automated-resource-cleanup.git && cd gcf-automated-resource-cleanup/
    
  2. Set the environment variables and make the repository folder your $WORKDIR folder, where you run all the commands:

    export PROJECT_ID=$(gcloud config list \
        --format 'value(core.project)' 2>/dev/null)
        WORKDIR=$(pwd)
    
  3. Install Apache Bench, an open source load-generation tool:

    sudo apt-get install apache2-utils
    

Cleaning up unused IP addresses

In this section, you complete the following steps:

  • Create two static IP addresses.
  • Create a VM that uses a static IP address.
  • Review the Cloud Functions code.
  • Deploy the Cloud Function.
  • Test the Cloud Function by using Cloud Scheduler jobs.

Create IP addresses

  1. In Cloud Shell, change to the unused-ip directory:

    cd $WORKDIR/unused-ip
    
  2. Export the names of the IP addresses as variables:

    export USED_IP=used-ip-address
    export UNUSED_IP=unused-ip-address
    
  3. Create two static IP addresses:

    gcloud compute addresses create $USED_IP \
        --project=$PROJECT_ID --region=us-central1
    gcloud compute addresses create $UNUSED_IP \
        --project=$PROJECT_ID --region=us-central1
    

    This example uses the us-central1 region, but you can choose a different region and refer to it consistently throughout the rest of this document.

  4. Confirm that two addresses were created:

    gcloud compute addresses list --filter="region:(us-central1)"
    

    In the output, a status of RESERVED means that the IP addresses aren't in use:

    NAME               ADDRESS/RANGE  TYPE      REGION       SUBNET  STATUS
    unused-ip-address  35.232.144.85  EXTERNAL  us-central1          RESERVED
    used-ip-address    104.197.56.87  EXTERNAL  us-central1          RESERVED
    
  5. Set the used IP address as an environment variable:

    export USED_IP_ADDRESS=$(gcloud compute addresses describe $USED_IP \
        --region=us-central1 --format=json | jq -r '.address')
    

Create a VM

  1. In Cloud Shell, create an instance:

    gcloud compute instances create static-ip-instance \
        --zone=us-central1-a \
        --machine-type=n1-standard-1 \
        --subnet=default \
        --address=$USED_IP_ADDRESS
    
  2. Confirm that one of the IP addresses is now in use:

    gcloud compute addresses list --filter="region:(us-central1)"
    

    The output is similar to the following:

    NAME               ADDRESS/RANGE  TYPE      REGION       SUBNET  STATUS
    unused-ip-address  35.232.144.85  EXTERNAL  us-central1          RESERVED
    used-ip-address    104.197.56.87  EXTERNAL  us-central1          IN_USE
    

Review the Cloud Function code

  • In Cloud Shell, output the main section of the code:

    cat $WORKDIR/unused-ip/function.js | grep "const compute" -A 31
    

    The output is as follows:

    const compute = new Compute();
    compute.getAddresses(function(err, addresses){ // gets all addresses across regions
         if(err){
             console.log("there was an error: " + err);
         }
         if (addresses == null) {
             console.log("no addresses found");
             return;
         }
         console.log("there are " + addresses.length + " addresses");
    
         // iterate through addresses
         for (let item of addresses){
    
              // get metadata for each address
              item.getMetadata(function(err, metadata, apiResponse) {
    
                  // if the address is not used AND if it's at least ageToDelete days old:
                  if ((metadata.status=='RESERVED') & (calculateAge(metadata.creationTimestamp) >= ageToDelete)){
                      // delete address
                      item.delete(function(err, operation, apiResponse2){
                          if (err) {
                              console.log("could not delete address: " + err);
                          }
                      })
                  }
              })
          }
           // return number of addresses evaluated
          res.send("there are " + addresses.length + " total addresses");
      });
    }
    

    In the preceding code sample, pay attention to the following:

    • compute.getAddresses(function(err, addresses){ // gets all addresses across regions
      

      Uses the getAddresses method to retrieve IP addresses across all regions in the project.

    • // get metadata for each address
      item.getMetadata(function(err, metadata, apiResponse) {
         // if the address is not used:
             if (metadata.status=='RESERVED'){
      

      Gets the metadata for each IP address and checks its STATUS field.

    • if ((metadata.status=='RESERVED') &
      (calculateAge(metadata.creationTimestamp) >= ageToDelete)){
      

      Checks whether the IP address is in use, calculates its age by using a helper function, and compares its age against a constant (set to 0 for the purposes of the example).

    • // delete address
      item.delete(function(err, operation, apiResponse2){
      

      Deletes the IP address.

Deploy the Cloud Function

  1. In Cloud Shell, deploy the Cloud Function:

    gcloud functions deploy unused_ip_function --trigger-http --runtime=nodejs8
    
  2. Set the trigger URL as an environment variable:

    export FUNCTION_URL=$(gcloud functions describe unused_ip_function \
        --format=json | jq -r '.httpsTrigger.url')
    

Schedule and test the Cloud Function

  1. In Cloud Shell, create a Cloud Scheduler task to run the Cloud Function at 2 AM every day:

    gcloud scheduler jobs create http unused-ip-job \
        --schedule="* 2 * * *" \
        --uri=$FUNCTION_URL
    
  2. Test the job by manually triggering it:

    gcloud scheduler jobs run unused-ip-job
    
  3. Confirm that the unused IP address was deleted:

    gcloud compute addresses list --filter="region:(us-central1)"
    

    The output is similar to the following:

    NAME             ADDRESS/RANGE  TYPE      REGION       SUBNET  STATUS
    used-ip-address  104.197.56.87  EXTERNAL  us-central1          IN_USE
    

Cleaning up unused and orphaned persistent disks

In this section, you complete the following steps:

  • Create two persistent disks.
  • Create a VM that uses one of the disks.
  • Detach the disk from the VM.
  • Review the Cloud Function code.
  • Deploy the Cloud Function.
  • Test the Cloud Function by using Cloud Scheduler jobs.

Create persistent disks

  1. In Cloud Shell, change to the unattached-pd directory:

    cd $WORKDIR/unattached-pd
    
  2. Export the names of the disks as environment variables:

    export ORPHANED_DISK=orphaned-disk
    export UNUSED_DISK=unused-disk
    
  3. Create the two disks:

    gcloud beta compute disks create $ORPHANED_DISK \
       --project=$PROJECT_ID \
       --type=pd-standard \
       --size=500GB \
       --zone=us-central1-a
    gcloud beta compute disks create $UNUSED_DISK \
        --project=$PROJECT_ID \
        --type=pd-standard \
        --size=500GB \
        --zone=us-central1-a
    
  4. Confirm that the two disks were created:

    gcloud compute disks list
    

    The output is as follows:

    NAME                LOCATION       LOCATION_SCOPE SIZE_GB TYPE         STATUS
    orphaned-disk       us-central1-a  zone           500     pd-standard  READY
    static-ip-instance  us-central1-a  zone           10      pd-standard  READY
    unused-disk         us-central1-a  zone           500     pd-standard  READY
    

Create a VM and inspect the disks

  1. In Cloud Shell, create the instance:

    gcloud compute instances create disk-instance \
        --zone=us-central1-a \
        --machine-type=n1-standard-1 \
        --disk=name=$ORPHANED_DISK,device-name=$ORPHANED_DISK,mode=rw,boot=no
    
  2. Inspect the disk that was attached to the VM:

    gcloud compute disks describe $ORPHANED_DISK \
        --zone=us-central1-a \
        --format=json | jq
    

    The output is similar to the following:

    {
      "creationTimestamp": "2019-06-12T12:21:25.546-07:00",
      "id": "7617542552306904666",
      "kind": "compute#disk",
      "labelFingerprint": "42WmSpB8rSM=",
      "lastAttachTimestamp": "2019-06-12T12:24:53.989-07:00",
      "name": "orphaned-disk",
      "physicalBlockSizeBytes": "4096",
      "selfLink": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/disks/orphaned-disk",
      "sizeGb": "500",
      "status": "READY",
      "type": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/diskTypes/pd-standard",
      "users": [
        "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/instances/disk-instance"
      ],
      "zone": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a"
    }
    

    In the preceding code sample, pay attention to the following:

    • users identifies the VM that the disk is attached to.
    • lastAttachTimestamp identifies when the disk was last attached to a VM.
  3. Inspect the disk that hasn't been attached to a VM:

    gcloud compute disks describe $UNUSED_DISK \
        --zone=us-central1-a \
        --format=json | jq
    

    The output is similar to the following:

    {
      "creationTimestamp": "2019-06-12T12:21:30.905-07:00",
      "id": "1313096191791918677",
      "kind": "compute#disk",
      "labelFingerprint": "42WmSpB8rSM=",
      "name": "unused-disk",
      "physicalBlockSizeBytes": "4096",
      "selfLink": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/disks/unused-disk",
      "sizeGb": "500",
      "status": "READY",
      "type": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/diskTypes/pd-standard",
      "zone": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a"
    }
    

    In the preceding code sample, the following is important:

    • The disk doesn't have users listed because it's not currently in use by a VM.
    • The disk doesn't have lastAttachedTimestamp because it's never been used.
  4. Detach the orphaned persistent disk from the VM:

    gcloud compute instances detach-disk disk-instance \
        --device-name=$ORPHANED_DISK \
        --zone=us-central1-a
    
  5. Inspect the orphaned disk:

    gcloud compute disks describe $ORPHANED_DISK \
        --zone=us-central1-a \
        --format=json | jq
    

    The output is similar to the following:

    {
      "creationTimestamp": "2019-06-12T12:21:25.546-07:00",
      "id": "7617542552306904666",
      "kind": "compute#disk",
      "labelFingerprint": "42WmSpB8rSM=",
      "lastAttachTimestamp": "2019-06-12T12:24:53.989-07:00",
      "lastDetachTimestamp": "2019-06-12T12:34:56.040-07:00",
      "name": "orphaned-disk",
      "physicalBlockSizeBytes": "4096",
      "selfLink": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/disks/orphaned-disk",
      "sizeGb": "500",
      "status": "READY",
      "type": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a/diskTypes/pd-standard",
      "zone": "https://www.googleapis.com/compute/v1/projects/automating-cost-optimization/zones/us-central1-a"
    }
    

    In the preceding code sample, the following is important:

    • The disk doesn't have users listed, which indicates that it isn't currently in use.
    • There is now a lastDetachTimestamp entry, indicating when the disk was last detached from a VM and, therefore, when it was last in use.
    • The lastAttachTimestamp field is still present.

Review the Cloud Function code

  1. In Cloud Shell, output the section of the code that retrieves all persistent disks in the project:

    cat $WORKDIR/unattached-pd/main.py | grep "(request)" -A 12
    

    The output is as follows:

    def delete_unattached_pds(request):
        # get list of disks and iterate through it:
        disksRequest = compute.disks().aggregatedList(project=project)
        while disksRequest is not None:
            diskResponse = disksRequest.execute()
            for name, disks_scoped_list in diskResponse['items'].items():
                if disks_scoped_list.get('warning') is None:
                    # got disks
                    for disk in disks_scoped_list['disks']: # iterate through disks
                        diskName = disk['name']
                        diskZone = str((disk['zone'])).rsplit('/',1)[1]
                        print (diskName)
                        print (diskZone)
    

    The function uses theaggregatedList method to get all persistent disks in the Google Cloud project where it's running and iterates through each of the disks.

  2. Output the section of the code that checks the lastAttachTimestamp field and deletes the disk if it doesn't exist:

    cat $WORKDIR/unattached-pd/main.py | grep "handle never" -A 11
    

    The output is as follows:

    # handle never attached disk - delete it
    # lastAttachedTimestamp is not present
    if disk.get("lastAttachTimestamp") is None:
           print ("disk " + diskName + " was never attached - deleting")
           deleteRequest = compute.disks().delete(project=project,
                  zone=diskZone,
                  disk=diskName)
           deleteResponse = deleteRequest.execute()
           waitForZoneOperation(deleteResponse, project, diskZone)
           print ("disk " + diskName + " was deleted")
           Continue