Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
This page explains how to transfer DAGs, data and configuration from your existing Cloud Composer 2 environment to Cloud Composer 3.
Before you begin
This migration guide uses snapshots. Snapshots are supported in Cloud Composer 2 version 2.0.9 and later.
Cloud Composer supports side-by-side migration from Cloud Composer 2 to Cloud Composer 3. It is not possible to upgrade from Cloud Composer 2 to Cloud Composer 3 in-place.
Check the list of differences between Cloud Composer 2 and Cloud Composer 3. Make sure that your environment doesn't use features that aren't yet available in Cloud Composer 3.
The maximum size of the Airflow database that supports snapshots is 20 GB. If your environment's database takes more than 20 GB, reduce the size of the Airflow database.
The total number of objects in the
/dags
,/plugins
and/data
folders in the environment's bucket must be less than 100,000 to create snapshots.If you use the XCom mechanism to transfer files, make sure that you use it according to Airflow's guidelines. Transferring big files or a large number of files using XCom impacts Airflow database's performance and can lead to failures when loading snapshots or upgrading your environment. Consider using alternatives such as Cloud Storage to transfer large volumes of data.
Make sure that your DAGs are compatible with Cloud Composer 3
Make sure that your DAGs are compatible with Cloud Composer 3 by following these suggestions:
The list of packages in the Cloud Composer 3 environment can be different than in your Cloud Composer 2 environment. This might affect the compatibility of your DAGs with Cloud Composer 3.
Cloud Composer loads configuration overrides, environment variables, and PyPI packages from your Cloud Composer 2 environment's snapshot to Cloud Composer 3 without changing or adjusting them for compatibility. If custom PyPI packages cause dependency conflicts, you can skip their installation when you load the snapshot.
In Cloud Composer 3, the environment's cluster is located in the tenant project. Make sure that your DAGs are compatible with this change. In particular,
KubernetesPodOperator
workloads now scale independently from your environment and it's not possible to use Pod affinity configs.In Cloud Composer 3, the Airflow database can't be accessed directly. Make sure that your DAGs are compatible with this change.
Pause DAGs in your Cloud Composer 2 environment
To avoid duplicate DAG runs, pause all DAGs in your Cloud Composer 2
environment before saving its snapshot. Skip the liveness monitoring DAG
(airflow_monitoring
), it is used for monitoring purposes and is not included
in environment snapshots.
You can use any of the following options to pause DAGs:
Pause DAGs one by one in Google Cloud console:
In the Google Cloud console, go to the Environments page.
Select an environment to view its details.
On the Environment details page, go to the DAGs tab.
Click the name of a DAG.
On the DAG details page, click Pause DAG.
In the Airflow web interface, go to DAGs and pause all DAGs manually.
Use the composer_dags script to pause all DAGs:
python3 composer_dags.py --environment COMPOSER_2_ENV \ --project PROJECT_ID \ --location COMPOSER_2_LOCATION \ --operation pause
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.PROJECT_ID
with the Project ID.COMPOSER_2_LOCATION
with the region where the environment is located.
Save a snapshot of your Cloud Composer 2 environment
Console
Create a snapshot of your environment:
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your Cloud Composer 2 environment. The Environment details page opens.
Click Create snapshot.
In the Create snapshot dialog, click Submit. In this guide, you save the snapshot in the Cloud Composer 2 environment's bucket, but you can select a different location, if you want to. If you specify a custom location, the service accounts of both environments must have read and write permissions for the specified location.
Wait until Cloud Composer creates the snapshot.
After the snapshot is created, the displayed message outputs the location of the snapshot. Save this information to use later when you load this snapshot to the Cloud Composer 3 environment.
For example, the snapshot location might look like:
gs://us-central1-example-916807e1-bucket/snapshots/snapshots_example-project_us-central1_example-environment/2024-05-15T15-23-12
.
gcloud
Create a snapshot of your Cloud Composer 2 environment:
gcloud composer environments snapshots save \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the Cloud Composer 2 environment is located.(optional) You can use the
--snapshot-location
argument to specify a a custom location where the environment's snapshot must be stored.In this guide, you save the snapshot in the Cloud Composer 2 environment's bucket, but you can select a different location, if you want to. If you specify a custom location, the service accounts of both environments must have read and write permissions for the specified location.
After the snapshot is created, the displayed message outputs the location of the snapshot. Save this information to use later when you load this snapshot to the Cloud Composer 3 environment.
For example, the snapshot location might look like:
gs://us-central1-example-916807e1-bucket/snapshots/snapshots_example-project_us-central1_example-environment/2024-05-15T15-23-12
.
Create a Cloud Composer 3 environment
Create a Cloud Composer 3 environment using the following guidelines:
You can start with the same resource limits configuration as your Cloud Composer 2 environment and later scale and optimize the configuration further.
In Cloud Composer 3 environments, Airflow DAG processor runs as a separate environment component. Because DAG processor offloads the parsing of DAGs from the scheduler, you might want to redistribute resources previously allocated to Airflow schedulers. You can do it later after you migrate to Cloud Composer 3 and monitor the scheduler and DAG processor performance.
Cloud Composer 3 provides a more streamlined and simplified networking setup compared to Cloud Composer 2. It is possible to switch between Public IP and Private IP networking configuration and attach and detach VPC networks. You don't need to specify IP ranges . Make sure that your Cloud Composer 3 environment's networking configuration matches the corresponding Cloud Composer 2 configuration.
You don't need to specify configuration overrides and environment variables because you replace them later when you load the snapshot of your Cloud Composer 2 environment.
Load the snapshot to your Cloud Composer 3 environment
Console
To load the snapshot to your Cloud Composer 3 environment:
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your Cloud Composer 3 environment. The Environment details page opens.
Click Load snapshot.
In the Load snapshot dialog, click Browse.
Select the folder with the snapshot.
If you use the default location for this guide, this folder is located in your Cloud Composer 2 environment bucket in the
/snapshots
folder, and its name is the timestamp of the snapshot save operation. The same location was displayed in the message about the successful creation of the snapshot.For example:
us-central1-example-916807e1-bucket/snapshots_example-project_us-central1_example-environment/2024-05-15T15-23-12
.Click Load and wait until Cloud Composer loads the snapshot.
gcloud
Load the snapshot of your Cloud Composer 2 environment to your Cloud Composer 3 environment:
gcloud composer environments snapshots load \
COMPOSER_3_ENV \
--location COMPOSER_3_LOCATION \
--snapshot-path "SNAPSHOT_PATH"
Replace:
COMPOSER_3_ENV
with the name of your Cloud Composer 3 environment.COMPOSER_3_LOCATION
with the region where the Cloud Composer 3 environment is located.SNAPSHOT_PATH
with the URI of your Cloud Composer 2 environment's bucket, followed by the path to the snapshot. The same location was displayed in the message about the successful creation of the snapshot. For example:gs://us-central1-example-916807e1-bucket/snapshots/example-project_us-central1_example-environment_2024-05-15T15-23-12
.
Unpause DAGs in the Cloud Composer 3 environment
You can use any of the following options:
Unpause DAGs one by one in Google Cloud console:
In the Google Cloud console, go to the Environments page.
Select an environment to view its details.
On the Environment details page, go to the DAGs tab.
Click the name of a DAG.
On the DAG details page, click Unpause DAG.
In the Airflow web interface, go to DAGs and unpause all DAGs manually one by one.
Use the composer_dags script to unpause all DAGs:
python3 composer_dags.py --environment COMPOSER_3_ENV \ --project PROJECT_ID \ --location COMPOSER_3_LOCATION \ --operation unpause
Replace:
COMPOSER_3_ENV
with the name of your Cloud Composer 3 environment.PROJECT_ID
with the Project ID.COMPOSER_3_LOCATION
with the region where the environment is located.
Check for DAG errors
In the Airflow web interface, go to DAGs and check for reported DAG syntax errors.
Check that DAG runs are scheduled at the correct time.
Wait for the DAG runs to happen in the Cloud Composer 3 environment and check if they were successful. If a DAG run was successful, don't unpause it in the Cloud Composer 2 environment; if you do so, a DAG run for the same time and date happens in your Cloud Composer 2 environment.
If a specific DAG runs fails, attempt to troubleshoot the DAG until it successfully runs in Cloud Composer 3.
Monitor your Cloud Composer 3 environment
After you transfer all DAGs and configuration to the Cloud Composer 3 environment, monitor it for potential issues, failed DAG runs, and overall environment health.
If the Cloud Composer 3 environment runs without problems for a sufficient period of time, consider deleting the Cloud Composer 2 environment.