Performing a PITR of a PostgreSQL database on Compute Engine


This tutorial shows how to set up the archiving process, and then perform a point-in-time recovery (PITR) of a PostgreSQL database running on Compute Engine.

In this tutorial, you create a demonstration database and run an application workload. Then, you configure the archive and backup processes. Next, you learn how to verify the backup, archive, and recovery processes. Finally, you learn how to recover the database to a specific point in time.

This tutorial is intended for database administrators, system operators, DevOps professionals, and cloud architects interested in configuring a backup and recovery strategy for PostgreSQL databases.

This tutorial assumes that you are familiar with Docker containers and that you are comfortable with Linux commands, PostgreSQL database engines, and Compute Engine.

Objectives

  • Set up a backup and archiving process.
  • Perform a PITR.
  • Monitor your backup.
  • Verify a recovery.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Compute Engine and Cloud Storage APIs.

    Enable the APIs

  5. Install the Google Cloud CLI.
  6. To initialize the gcloud CLI, run the following command:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  8. Make sure that billing is enabled for your Google Cloud project.

  9. Enable the Compute Engine and Cloud Storage APIs.

    Enable the APIs

  10. Install the Google Cloud CLI.
  11. To initialize the gcloud CLI, run the following command:

    gcloud init
  12. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

Concepts

Before you start the tutorial, review the following PostgreSQL concepts:

  • Continuous archiving. When the database continuously saves sequential transactions to a file.
  • Write Ahead Log (WAL). Changes to data files are recorded in the WAL before they are made to the file.
  • WAL Record. Each transaction applied to the database is formatted and stored as a WAL record.
  • Segment files. Segment files have monotonically increasing filenames, and contain as many WAL records as possible. The file size is configurable, with a default of 16 MiB. You might choose a larger size if you expect voluminous transactions in size, or count, to lower the aggregate number of generated segment files and to decrease the file management burden.

For more information, see Reliability and the Write-Ahead Log.

The following diagram shows how WALs are persisted in two stages.

2 stages of persistent WALs.

In the preceding diagram, the first stage of persisting WALs consists of the database engine recording write transactions in the WAL buffer concurrently with the write to a table. When the transaction is committed, the WAL buffer is written (flushed) to disk during the second stage with an append to the WAL segment file.

Choosing a PITR

A PITR is appropriate for the following scenarios:

  • Minimize the recovery point objective (RPO). The RPO is the maximum time of data loss that is tolerated before it significantly impacts business processes. Saving all transactions in the WALs between backup snapshots drastically decreases the amount of lost data because you have the transactions since the last full backup to apply to the database.
  • Minimize the recovery time objective (RTO). The RTO is the amount of time required to recover a database if a destructive event occurs. After you set up binary backups and log archiving, the time required to recover the database can be minimal.
  • Remediation for a data corruption bug, or an administrative misstep. If a code release causes catastrophic data corruption, or an unrecoverable mistake is made during routine maintenance, you can recover to before that moment.

In some application architectures, such as a microservices architecture, there might be parallel databases that could require independent recoveries. For example, a retail application might have customer data in one database and retail order details and inventory information in other databases. Depending on the overall state of data, one, two, or all databases might need to be recovered in parallel.

A PITR is not appropriate in the following scenarios:

  • RPO is large. If your disaster recovery policy can tolerate the loss of any transactions received after the recent snapshot, you can avoid additional steps and focus on reducing the time to recover your data.
  • A complete database recovery is needed. If your goal is to recover to the most recent transaction, your recovery target is the timestamp of the last persisted transaction. This scenario is a specific case of PITR but semantically this goal is referred to as a full recovery.

Performance considerations

The archiving process puts additional I/O load on your database server. The additional load is dependent upon the characteristics of your workload, because it is proportionate to the write, update, and delete transaction volume.

If you want to reduce the I/O impact that the WAL archive activity might incur on your primary database, you can perform the periodic WAL archives using a read-only replica.

This configuration isolates the primary database from the batch-oriented I/O activities related to the transfer of the WAL files. Transactions destined for the read-only replica are transmitted in a constant stream from the primary database, thus exacting a much lower impact upon steady-state throughput.

Further, if your production database topology already includes a read-only replica, this configuration doesn't add any additional burden: management, price or otherwise.

Reference architecture

The following diagram illustrates the architecture that you implement in this tutorial.

Cloud infrastructure of PITR using Compute Engine and Cloud Storage.

In this tutorial, you create cloud infrastructure to observe a PITR that is using the following components:

  • A PostgreSQL database server running on Compute Engine.
  • Cloud Storage for storage of snapshots and transaction logs.

The following diagram shows the two Docker containers that are launched on the PostgreSQL database virtual machine (VM). As a separation of concerns, the database server is running in one of the containers, and the WAL archiver is running in the other container.

Docker containers for the database server and the WAL archiver.

This diagram shows how the Docker volumes in each container are mapped to Persistent Disk mount points on the host VM.

Setting up environment variables

The scripts and commands used in this tutorial rely on shell environment variables.

  1. In Cloud Shell, set environment variables for your project, the instance name, and the demonstration PostgreSQL database.

    export PROJECT_ID=your-gcp-project
    export PG_INSTANCE_NAME=instance-pg-pitr
    export POSTGRES_PASSWORD=PasswordIsThis
    export POSTGRES_PITR_DEMO_DBNAME=pitr_demo
    

    Replace the following:

    • your-gcp-project: the name of the project that you created for this tutorial.
    • PasswordIsThis: a secure password for the PostgreSQL database.
  2. Set the environment variable for the Google Cloud zone. Replace choose-an-appropriate-zone with a Google Cloud zone.

    export ZONE=choose-an-appropriate-zone
    export REGION=${ZONE%-[a-z]}
    
  3. Set the environment variable for the default Virtual Private Cloud (VPC) subnet for the region of your zone:

    export SUBNETWORK_URI=$(gcloud compute networks subnets \
        describe default --format=json --region=$REGION | \
        jq --raw-output '.ipCidrRange')
    
  4. Set the environment variable for the Cloud Storage bucket. Replace archive-bucket with a unique name for the Cloud Storage bucket where WALs are saved.

    export ARCHIVE_BUCKET=archive-bucket
    

Creating a Cloud Storage bucket

  • Create a Cloud Storage bucket to archive the WAL files from the PostgreSQL database:

    gsutil mb gs://${ARCHIVE_BUCKET}
    

Allowing access to private IP addresses instances

For the instances used in this tutorial, as in many production use cases, there is no need for the VM instances to obtain public IP addresses. However, the instances require access to the public internet to pull the example container images, and you require access in order to connect by using a secure shell. You configure a network address translation (NAT) gateway and configure Identity-Aware Proxy (IAP) for TCP forwarding.

Create a NAT gateway

Because the VM instances that you create don't have public IP addresses, you create a NAT gateway so that the instances can pull container images from the Docker Hub.

  1. In Cloud Shell, create a Cloud Router:

    export CLOUD_ROUTER_NAME=${PROJECT_ID}-nat-router
    gloud compute routers create $CLOUD_ROUTER_NAME \
        --network=default --region=$REGION
    
  2. Create the NAT gateway:

    gcloud compute routers nats create ${PROJECT_ID}-nat-gateway \
        --region=$REGION \
        --router=$CLOUD_ROUTER_NAME \
        --auto-allocate-nat-external-ips \
        --nat-all-subnet-ip-ranges
    

Configure IAP for TCP forwarding

IAP controls access to your cloud applications and VMs running on Google Cloud. IAP verifies the user identity and context of the request to determine whether a user is allowed to access a VM.

  1. In Cloud Shell, allow traffic from the TCP forwarding net block to the instances in your project:

    export IAP_FORWARDING_CIDR=35.235.240.0/20
    gcloud compute --project=$PROJECT_ID firewall-rules create \
        cloud-iap-tcp-forwarding --direction=INGRESS  \
        --priority=1000 --network=default \
        --action=ALLOW --rules=all  \
        --source-ranges=$IAP_FORWARDING_CIDR
    
  2. To connect by using a TCP forwarding tunnel, add an Identity and Access Management (IAM) policy binding. Replace your-email-address with the email address that you use to log into Google Cloud console.

    export GRANT_EMAIL_ADDRESS=your-email-address
    gcloud projects add-iam-policy-binding $PROJECT_ID \
       --member=user:$GRANT_EMAIL_ADDRESS \
       --role=roles/iap.tunnelResourceAccessor
    

Creating the PostgreSQL database infrastructure

  1. In Cloud Shell, clone the source repository that contains the configuration scripts, and change the shell context to the local repository:

    git clone https://github.com/GoogleCloudPlatform/gcs-postgresql-recovery-tutorial
    cd gcs-postgresql-recovery-tutorial
    
  2. To create and configure the database VM instance, run the following script:

    cd bin
    ./create_postgres_instance.sh
    

    For this tutorial, this script starts a VM instance in your chosen zone with the container-optimized operating system, and two new attached persistent disks. In this case, you can ignore the warning message returned by the API about poor I/O performance because the scripts create small Persistent Disks.

Reviewing the cloud-init configuration

Cloud-init is a multi-distribution package that initializes a cloud instance.

Review the following cloud-init code sample:

write_files:
- path: /var/tmp/docker-entrypoint-initdb.d/init-pitr-demo-db.sql
  permissions: 0644
  owner: root
  content: |
    CREATE DATABASE ${POSTGRES_PITR_DEMO_DBNAME};

    \c ${POSTGRES_PITR_DEMO_DBNAME}

    CREATE SCHEMA pitr_db_schema;

    CREATE TABLE pitr_db_schema.customer
       (id SERIAL NOT NULL,
        name VARCHAR(255),
        create_timestamp TIMESTAMP DEFAULT current_timestamp,
        PRIMARY KEY (id));

    CREATE TABLE pitr_db_schema.invoice
       (id SERIAL NOT NULL,
        customer_id INTEGER
          REFERENCES pitr_db_schema.customer(id),
        description VARCHAR(1000),
        create_timestamp TIMESTAMP DEFAULT current_timestamp,
        PRIMARY KEY (customer_id, id));

- path: /etc/systemd/system/postgres.service
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Requires=docker.service
    After=docker.service
    Description=postgres docker container

    [Service]
    TimeoutStartSec=0
    KillMode=none
    Restart=always
    RestartSec=5s
    ExecStartPre=-/usr/bin/docker kill postgres-db
    ExecStartPre=-/usr/bin/docker rm -v postgres-db
    ExecStart=/usr/bin/docker run -u postgres --name postgres-db \
                                  -v /var/tmp/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d \
                                  -v /mnt/disks/data:/var/lib/postgresql/data \
                                  -v /mnt/disks/wal:/var/lib/postgresql/wal \
                                  -e PGDATA=/var/lib/postgresql/data/pgdata \
                                  -e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
                                  -e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/wal/pg_wal \
                                  -p ${POSTGRES_PORT}:${POSTGRES_PORT} \
                               postgres:11-alpine
    ExecStop=-/usr/bin/docker stop postgres-db
    ExecStopPost=-/usr/bin/docker rm postgres-db

- path: /etc/systemd/system/wal_archive.service
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Requires=docker.service postgres.service
    After=docker.service postgres.service
    Description=WAL archive docker container

    [Service]
    TimeoutStartSec=10min
    Type=oneshot
    ExecStart=/usr/bin/docker run --name wal-archive \
                                  -v /mnt/disks/wal/pg_wal_archive:/mnt/wal_archive \
                               google/cloud-sdk:slim gsutil mv /mnt/wal_archive/[0-9A-F]*[0-9A-F] gs://${ARCHIVE_BUCKET}
    ExecStopPost=-/usr/bin/docker rm wal-archive

- path: /etc/systemd/system/wal_archive.timer
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Description=Archive WAL to GCS (every 5 minutes)

    [Timer]
    OnBootSec=5min
    OnUnitInactiveSec=5min
    OnUnitActiveSec=5min

    [Install]
    WantedBy=timers.target

For this tutorial, cloud-init is used to do the following:

  1. Create two Persistent Disk block storage devices.
  2. Create the file systems on the two devices: one for the data and one for the archive logs.
  3. Mount the devices at logical mount points on the VM instance which are shared with the Docker containers.
  4. Create and then start a systemd service (postgres.service), which starts a PostgreSQL Docker container with the following:
    • The persistent disks mounted as volumes.
    • The PostgreSQL port (5432) published to the VM host.
  5. Create a /var/tmp/docker-entrypoint-initdb.d/init-pitr-demo-db.sql file to create a simple set of tables in a demonstration database and schema.
  6. Create and start a second systemd service (wal_archive.service) that runs a Google Cloud CLI Docker container with the WAL disks mounted as a volume. This service backs up archived WAL files to Cloud Storage.
  7. Create, enable, and then start a systemd timer (wal_archive.timer) that periodically runs the wal_archive.service.
  8. Ensure that the PostgreSQL port (5432) is open for the VPC subnet so that the transaction generator can reach the database port.

Modify the database instance configuration

The database server is running, but you need to configure network access, and to start the WAL archiving process.

Connect to the database VM instance

  1. In the Google Cloud console, go to the VM instances page.

    Go to VM instance

  2. To open a terminal shell, next to the instance-pg-pitr instance that you created, click SSH.

  3. In the terminal shell, check that the Docker container started: docker ps

    The output is similar to the following:

    CONTAINER ID   IMAGE                COMMAND                  CREATED              STATUS              PORTS   NAMES
    8bb65d8c1197   postgres:11-alpine   "docker-entrypoint.s…"   About a minute ago   Up About a minute           postgres-db
    

    If the container is not yet running, wait a moment, and then use the same command to check again.

Allow inbound network connections to database

  1. In the terminal shell of the instance-pg-pitr instance, open the PostgreSQL host-based authentication configuration file for editing:

    sudoedit /mnt/disks/data/pgdata/pg_hba.conf
    
  2. To remove the default all IP address access to the database, comment out the following line from the end of the file by adding # at the beginning of the line. The line in the file looks similar to the following:

    #host all all all md5
    
  3. To allow password-protected connections from hosts in the 10.0.0.0/8 CIDR block, add the following line to the end of the file:

    host    all             all             10.0.0.0/8               md5
    

    This entry enables connectivity from the VPC subnet where the transaction generator is later created.

  4. Save and then close the file.

Configure WAL archiving

  1. In the terminal shell of the instance-pg-pitr instance, edit the postgresql.conf file:

    sudoedit /mnt/disks/data/pgdata/postgresql.conf
    
  2. Replace the existing commented-out archive_mode, archive_command, and archive_timeout lines with the following:

    archive_mode=on
    archive_command = '( ARCHIVE_PATH=/var/lib/postgresql/wal/pg_wal_archive;
    test ! -f $ARCHIVE_PATH/%f && cp %p $ARCHIVE_PATH/%f.cp && mv
    $ARCHIVE_PATH/%f.cp $ARCHIVE_PATH/%f ) '
    archive_timeout = 120
    

    When you replace the lines in the modified file, it looks similar to the following code snippet: