Use the Bigtable change stream to BigQuery template

In this quickstart, you learn how to set up a Bigtable table with a change stream enabled, run a change stream pipeline, make changes to your table, and then see the changes streamed.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the Dataflow, Cloud Bigtable API, Cloud Bigtable Admin API, and BigQuery APIs.

    Enable the APIs

  4. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

Create a BigQuery dataset

Use the Google Cloud console to create a dataset that stores the data.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click your project name.

  3. Expand the Actions option and click Create dataset.

  4. On the Create dataset page, do the following:

    1. For Dataset ID, enter bigtable_bigquery_quickstart.
    2. Leave the remaining default settings as they are, and click Create dataset.

Create a table with a change stream enabled

  1. In the Google Cloud console, go to the Bigtable Instances page.

    Go to Instances

  2. Click the ID of the instance that you are using for this quickstart.

    If you don't have an instance available, create an instance with the default configurations in a region near you.

  3. In the left navigation pane, click Tables.

  4. Click Create a table.

  5. Name the table bigquery-changestream-quickstart.

  6. Add a column family named cf.

  7. Select Enable change stream.

  8. Click Create.

  9. On the Bigtable Tables page, find your table bigquery-changestream-quickstart.

  10. In the Change stream column, click Connect.

  11. In the dialog, select BigQuery.

  12. Click Create Dataflow job.

  13. In the provided parameter fields, enter your parameter values. You don't need to provide any optional parameters.

    1. Set the Bigtable application profile ID to default.
    2. Set the BigQuery dataset to bigtable_bigquery_quickstart.
  14. Click Run job.

  15. Wait until the job status is Starting or Running before proceeding. It takes around 5 minutes once the job is queued.

  16. Keep the job open in a tab, so you can stop the job when cleaning up your resources.

Write some data to Bigtable

  1. In the Cloud Shell, write a few rows to Bigtable so the change log can write some data to BigQuery. As long as you write the data after the job is created, the changes appear. You don't have to wait for the job status to become running.

    cbt -instance=BIGTABLE_INSTANCE_ID -project=PROJECT_ID \
        set bigquery-changestream-quickstart user123 cf:col1=abc
    cbt -instance=BIGTABLE_INSTANCE_ID -project=PROJECT_ID \
        set bigquery-changestream-quickstart user546 cf:col1=def
    cbt -instance=BIGTABLE_INSTANCE_ID -project=PROJECT_ID \
        set bigquery-changestream-quickstart user789 cf:col1=ghi

    Replace the following:

    • PROJECT_ID: the ID of the project that you are using
    • BIGTABLE_INSTANCE_ID: the ID of the instance that contains the bigquery-changestream-quickstart table

View the change logs in BigQuery

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the dataset bigtable_bigquery_quickstart.

  3. Click the table bigquery-changestream-quickstart_changelog.

  4. To see the change log, click Preview.

    Change log preview in BigQuery

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  1. Disable the change stream on the table:

    gcloud bigtable instances tables update bigquery-changestream-quickstart \
    --project=PROJECT_ID --instance=BIGTABLE_INSTANCE_ID \
  2. Delete the table bigquery-changestream-quickstart:

    cbt --instance=BIGTABLE_INSTANCE_ID --project=PROJECT_ID deletetable bigquery-changestream-quickstart
  3. Stop the change stream pipeline:

    1. In the Google Cloud console, go to the Dataflow Jobs page.

      Go to Jobs

    2. Select your streaming job from the job list.

    3. In the navigation, click Stop.

    4. In the Stop job dialog, select Cancel, and then click Stop job.

  4. Delete the BigQuery dataset:

    1. In the Google Cloud console, go to the BigQuery page.

      Go to BigQuery

    2. In the Explorer panel, find the dataset bigtable_bigquery_quickstart and click it.

    3. Click Delete, type delete, and then click Delete to confirm.

  5. Optional: Delete the instance if you created a new one for this quickstart:

    cbt deleteinstance BIGTABLE_INSTANCE_ID

What's next