Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
This page explains how to transfer DAGs, data and configuration from your existing Cloud Composer 1, Airflow 1 environments to Cloud Composer 2, Airflow 2.
Other migration guides
From | To | Method | Guide |
---|---|---|---|
Cloud Composer 1, Airflow 2 | Cloud Composer 2, Airflow 2 | Side-by-side, using snapshots | Migration guide (snapshots) |
Cloud Composer 1, Airflow 1 | Cloud Composer 2, Airflow 2 | Side-by-side, using snapshots | Migration guide (snapshots) |
Cloud Composer 1, Airflow 2 | Cloud Composer 2, Airflow 2 | Side-by-side, manual transfer | Manual migration guide |
Cloud Composer 1, Airflow 1 | Cloud Composer 2, Airflow 2 | Side-by-side, manual transfer | This guide (manual migration) |
Airflow 1 | Airflow 2 | Side-by-side, manual transfer | Manual migration guide |
Before you begin
- Cloud Composer supports side-by-side migration from Cloud Composer 1 to Cloud Composer 2. It is not possible to upgrade from Cloud Composer 1 to Cloud Composer 2 in-place.
- Check the list of differences betweeen Cloud Composer 1 and Cloud Composer 2.
Because Cloud Composer 2 uses Airflow 2, the migration includes switching your DAGs and environment configuration to Airflow 2. Check the migration guide from Airflow 1 to Airflow 2 for information about the breaking changes between Airflow 1 and Airflow 2 in Cloud Composer.
In this guide, you combine migration to Airflow 2 and migration to Cloud Composer 2 in one migration procedure. In this way, you do not need to migrate to a Cloud Composer 1 environment with Airflow 2 before migrating to Cloud Composer 2.
Step 1: Upgrade to Airflow 1.10.15
If your environment uses an Airflow version earlier than 1.10.15, upgrade your environment to a Cloud Composer version that uses Airflow 1.10.15.
Step 2: Check compatibility with Airflow 2
To check for potential conflicts with Airflow 2, consult the Upgrading to Airflow 2.0+ guide, in the section about upgrading DAGs.
One common issue that you might encounter is related to incompatible import paths. For more information about solving this compatibility issue, in the Upgrading to Airflow 2.0+ guide, see the section about backport providers.
Step 3: Get the list of configuration overrides, custom PyPI packages, and environment variables
Console
Get the list of your Cloud Composer 1 environment's configuration overrides, custom PyPI packages, and environment variables:
Go to the Environments page in the Google Cloud console:
Select your Cloud Composer 1 environment.
View environment variables on the Environment variables tab.
View configuration overrides on the Airflow configurations overrides tabs.
View custom PyPI packages on the PyPI packages tab.
gcloud
To get the list of environment variables, run:
gcloud composer environments describe \
COMPOSER_1_ENV \
--location COMPOSER_1_LOCATION \
--format="value(config.softwareConfig.envVariables)"
To get the list of environment's Airflow configuration overrides, run:
gcloud composer environments describe \
COMPOSER_1_ENV \
--location COMPOSER_1_LOCATION \
--format="value(config.softwareConfig.airflowConfigOverrides)"
To get the list of custom PyPI packages, run:
gcloud composer environments describe \
COMPOSER_1_ENV \
--location COMPOSER_1_LOCATION \
--format="value(config.softwareConfig.pypiPackages)"
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Terraform
Skip this step. Your Cloud Composer 1 environment's configuration already lists configuration overrides, custom PyPI packages, and environment variables for your environment.
Step 4: Create a Cloud Composer 2 environment
In this step, create a Cloud Composer 2 environment. You can start with an environment preset that matches your expected resource demands, and later scale and optimize your environment further.
Console
Create a Cloud Composer 2 environment and specify configuration overrides and environment variables.
As an alternative, you can override Airflow configurations and environment variables after you create an environment.
Some configuration options from Airflow 1 use a different name and section in Airflow 2. For more information, see Configuration changes.
gcloud
Create a Cloud Composer 2 environment and specify configuration overrides and environment variables.
As an alternative, you can override Airflow configurations and environment variables after you create an environment.
Some configuration options from Airflow 1 use a different name and section in Airflow 2. For more information, see Configuration changes.
Terraform
Create a Cloud Composer 2 environment based on the configuration of the Cloud Composer 1 environment:
- Copy your Cloud Composer 1 environment's configuration.
- Change the name of your environment.
Use the
google-beta
provider:resource "google_composer_environment" "example_environment_composer_2" { provider = google-beta # ... }
Specify a Cloud Composer 2 image in the
config.software_config
block:software_config { image_version = "composer-2.10.1-airflow-2.10.2" # ... }
If not aleady, specify configuration overrides and environment variables.
Specify custom PyPI packages in the
config.software_config.pypi_packages
block:software_config { # ... pypi_packages = { numpy = "" scipy = ">=1.1.0" } }
Step 5: Install PyPI packages to the Cloud Composer 2 environment
After your Cloud Composer 2 environment is created, install custom PyPI packages to it.
Console
Go to the Environments page in the Google Cloud console:
Select your Cloud Composer 2 environment.
Go to the PyPI packages tab and click Edit.
Copy PyPI package requirements from your Cloud Composer 1 environment. Click Save and wait until the environment updates.
gcloud
Create a
requirements.txt
file with the list of custom PyPI packages:numpy scipy>=1.1.0
Update your environment and pass the
requirements.txt
file in the :to the--update-pypi-packages-from-file
command:gcloud composer environments update COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ --update-pypi-packages-from-file requirements.txt
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the Cloud Composer 2 environment is located.
Terraform
Skip this step. You already installed custom PyPI packages when you created the environment.
Step 6: Transfer variables and pools
Airflow supports exporting variables and pools to JSON files. You can then import these files to your Cloud Composer 2 environment.
Airflow CLI commands used in this step operate on local files in Airflow
workers. To upload or download the files, use the /data
folder in the
Cloud Storage bucket of your environment. This folder syncs to the
/home/airflow/gcs/data/
directory in Airflow workers. In the Airflow CLI
commands, specify /home/airflow/gcs/data/
in the FILEPATH
parameter.
gcloud
Export variables from your Cloud Composer 1 environment:
gcloud composer environments run \ COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ variables -- -e /home/airflow/gcs/data/variables.json
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Export pools from your Cloud Composer 1 environment:
gcloud composer environments run COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ pool -- -e /home/airflow/gcs/data/pools.json
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Get your Cloud Composer 2 environment's bucket URI.
Run the following command:
gcloud composer environments describe COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ --format="value(config.dagGcsPrefix)"
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the environment is located.
In the output, remove the
/dags
folder. The result is the URI of your Cloud Composer 2 environment's bucket.For example, change
gs://us-central1-example-916807e1-bucket/dags
togs://us-central1-example-916807e1-bucket
.
Transfer JSON files with variables and pools to your Cloud Composer 2 environment:
gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION \ --source=variables.json
gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION \ --source=pools.json
Replace:
COMPOSER_2_BUCKET
with the URI of your Cloud Composer 2 environment bucket, obtained on the previous step.COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Import variables and pools to Cloud Composer 2:
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ variables import \ -- /home/airflow/gcs/data/variables.json
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ pools import \ -- /home/airflow/gcs/data/pools.json
Check that variables and pools are imported:
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ variables list
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ pools list
Remove JSON files from the buckets:
gcloud composer environments storage data delete \ variables.json \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
gcloud composer environments storage data delete \ pools.json \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
gcloud composer environments storage data delete \ variables.json \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
gcloud composer environments storage data delete \ pools.json \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
Step 7: Transfer other data from your Cloud Composer 1 environment's bucket
Transfer plugins and other data from your Cloud Composer 1 environment's bucket.
gcloud
Transfer plugins to your Cloud Composer 2 environment. To do so, export plugins from your Cloud Composer 1 environment's bucket to the
/plugins
folder in your Cloud Composer 2 environment's bucket:gcloud composer environments storage plugins export \ --destination=COMPOSER_2_BUCKET/plugins \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
Check that the
/plugins
folder is successfully imported:gcloud composer environments storage plugins list \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
Export the
/data
folder from your Cloud Composer 1 environment to the Airflow 2 environment:gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
Check that the
/data
folder is successfully imported:gcloud composer environments storage data list \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
Step 8: Transfer connections
Airflow 1.10.15 does not support exporting connections. To transfer connections, manually create connections in your Cloud Composer 2 environment from the Cloud Composer 1 environment.
gcloud
To get a list of connections in your Cloud Composer 1 environment, run:
gcloud composer environments run COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ connections -- --list
To create a new connection in your Cloud Composer 2 environment, run the
connections
Airflow CLI command throughgcloud
. For example:gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ connections add \ -- --conn-host postgres.example.com \ --conn-port 5432 \ --conn-type postgres \ --conn-login example_user \ --conn-password example_password \ --conn-description "Example connection" \ example_connection
Step 9: Transfer user accounts
This step explains how to transfer users by creating them manually.
Airflow 1.10.15 does not support exporting users. To transfer users and connections, manually create new user accounts in your Airflow 2 environment from the Cloud Composer 1 environment.
Airflow UI
To view a list of users in your Cloud Composer 1 environment:
Open the Airflow web interface for your Cloud Composer 1 environment.
Go to Admin > Users.
To create a user in your Cloud Composer 2 environment:
Open the Airflow web interface for your Cloud Composer 2 environment.
Go to Security > List Users.
Click Add a new record.
gcloud
-
It is not possible to view a list of users through
gcloud
in Airflow 1. Please use the Airflow UI. To create a new user account in your Cloud Composer 2 environment, run the
users create
Airflow CLI command throughgcloud
. For example:gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ users create \ -- --username example_username \ --firstname Example-Name \ --lastname Example-Surname \ --email example-user@example.com \ --use-random-password \ --role Op
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the Cloud Composer 2 environment is located.- All user configuration parameters with their values from your Cloud Composer 1 environment, including user's role.
Step 10: Make sure that your DAGs are ready for Airflow 2
Before transferring DAGs to your Cloud Composer 1 environment, make sure that:
Your DAGs run successfully and there are no remaining compatibility issues.
Your DAGs use correct import statements.
For example, the new import statement for
BigQueryCreateDataTransferOperator
can look like this:from airflow.providers.google.cloud.operators.bigquery_dts \ import BigQueryCreateDataTransferOperator
Your DAGs are upgraded for Airflow 2. This change is compatible with Airflow 1.10.14 and later versions.
Step 11: Transfer DAGs to the Cloud Composer 2 environment
The following potential problems might happen when you transfer DAGs between environments:
If a DAG is enabled (not paused) in both environments, each environment runs its own copy of the DAG, as scheduled. This might lead to duplicate DAG runs for the same data and execution time.
Because of DAG catchup, Airflow schedules extra DAG runs, beginning from the start date specified in your DAGs. This happens because the new Airflow instance does not take into account the history of DAG runs from the Cloud Composer 1 environment. This might lead to a large number of DAG runs scheduled starting from the specified start date.
Prevent duplicate DAG runs
In your Cloud Composer 2 environment, In your Airflow 2 environment, add
an Airflow configuration option override for the
dags_are_paused_at_creation
option. After you make this change, all new DAGs
are paused by default.
Section | Key | Value |
---|---|---|
core |
dags_are_paused_at_creation |
True |
Prevent extra or missing DAG runs
To avoid gaps and overlaps in
execution dates disable
catch up in your Cloud Composer 2. In this way, after you upload DAGs to your
Cloud Composer 2 environment, Airflow does not schedule DAG runs that were
already run in the Cloud Composer 1 environment. Add
an Airflow configuration option override for the
catchup_by_default
option:
Section | Key | Value |
---|---|---|
scheduler |
catchup_by_default |
False |
Transfer your DAGs to the Cloud Composer 2 environment
To transfer your DAGs to the Cloud Composer 2 environment:
Upload the DAG from the Cloud Composer 1 environment to the Cloud Composer 2 environment. Skip the
airflow_monitoring.py
DAG.The DAGs are paused in the Cloud Composer 2 environment because of the configuration override, so no DAG runs are scheduled.
In the Airflow web interface, go to DAGs and check for reported DAG syntax errors.
At the time when you plan to transfer the DAG:
Pause the DAGs in your Cloud Composer 1 environment.
Un-pause the DAGs in your Cloud Composer 2 environment.
Check that the new DAG runs are scheduled at the correct time.
Wait for the DAG runs to happen in the Cloud Composer 2 environment and check if they were successful. If a DAG run was successful, do not unpause it in the Cloud Composer 1 environment; if you do so, a DAG run for the same time and date happens in your Cloud Composer 1 environment.
If a specific DAG runs fails, attempt to troubleshoot the DAG until it successfully runs in Cloud Composer 2.
If required, you can always fall back to the Cloud Composer 1 version of the DAG and execute DAG runs that failed in Cloud Composer 2 from your Cloud Composer 1 environment:
Pause the DAG in your Cloud Composer 2 environment.
Un-pause the DAG in your Cloud Composer 1 environment. This schedules catch up DAG runs for the time when the DAG was paused in Cloud Composer 1 environment.
Step 12: Monitor your Cloud Composer 2 environment
After you transfer all DAGs and configuration to the Cloud Composer 2 environment, monitor it for potential issues, failed DAG runs, and overall environment health. If the Cloud Composer 2 environment runs without problems for a sufficient period of time, consider deleting the Cloud Composer 1 environment.
What's next
- Troubleshooting DAGs
- Troubleshooting environment creation
- Troubleshooting environment updates
- Using backport packages