Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
This page explains how to transfer DAGs, data and configuration from your existing Cloud Composer 1, Airflow 2 environments to Cloud Composer 2, Airflow 2.
Other migration guides
From | To | Method | Guide |
---|---|---|---|
Cloud Composer 1, Airflow 2 | Cloud Composer 2, Airflow 2 | Side-by-side, using snapshots | Migration guide (snapshots) |
Cloud Composer 1, Airflow 1 | Cloud Composer 2, Airflow 2 | Side-by-side, using snapshots | Migration guide (snapshots) |
Cloud Composer 1, Airflow 2 | Cloud Composer 2, Airflow 2 | Side-by-side, manual transfer | This guide (manual migration) |
Cloud Composer 1, Airflow 1 | Cloud Composer 2, Airflow 2 | Side-by-side, manual transfer | Manual migration guide |
Airflow 1 | Airflow 2 | Side-by-side, manual transfer | Manual migration guide |
Before you begin
- Cloud Composer supports side-by-side migration from Cloud Composer 1 to Cloud Composer 2. It is not possible to upgrade from Cloud Composer 1 to Cloud Composer 2 in-place.
- Check the list of differences betweeen Cloud Composer 1 and Cloud Composer 2.
Step 1: Get the list of configuration overrides, custom PyPI packages, and environment variables
Console
Get the list of your Cloud Composer 1 environment's configuration overrides, custom PyPI packages, and environment variables:
Go to the Environments page in the Google Cloud console:
Select your Cloud Composer 1 environment.
View environment variables on the Environment variables tab.
View configuration overrides on the Airflow configurations overrides tabs.
View custom PyPI packages on the PyPI packages tab.
gcloud
To get the list of environment variables, run:
gcloud composer environments describe \
COMPOSER_1_ENV \
--location COMPOSER_1_LOCATION \
--format="value(config.softwareConfig.envVariables)"
To get the list of environment's Airflow configuration overrides, run:
gcloud composer environments describe \
COMPOSER_1_ENV \
--location COMPOSER_1_LOCATION \
--format="value(config.softwareConfig.airflowConfigOverrides)"
To get the list of custom PyPI packages, run:
gcloud composer environments describe \
COMPOSER_1_ENV \
--location COMPOSER_1_LOCATION \
--format="value(config.softwareConfig.pypiPackages)"
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Terraform
Skip this step. Your Cloud Composer 1 environment's configuration already lists configuration overrides, custom PyPI packages, and environment variables for your environment.
Step 2: Create a Cloud Composer 2 environment
In this step, create a Cloud Composer 2 environment. You can start with an environment preset that matches your expected resource demands, and later scale and optimize your environment further.
Console
Create a Cloud Composer 2 environment and specify configuration overrides and environment variables.
As an alternative, you can override Airflow configurations and environment variables after you create an environment.
gcloud
Create a Cloud Composer 2 environment and specify configuration overrides and environment variables.
As an alternative, you can override Airflow configurations and environment variables after you create an environment.
Terraform
Create a Cloud Composer 2 environment based on the configuration of the Cloud Composer 1 environment:
- Copy your Cloud Composer 1 environment's configuration.
- Change the name of your environment.
Use the
google-beta
provider:resource "google_composer_environment" "example_environment_composer_2" { provider = google-beta # ... }
Specify a Cloud Composer 2 image in the
config.software_config
block:software_config { image_version = "composer-2.10.2-airflow-2.10.2" # ... }
If not aleady, specify configuration overrides and environment variables.
Specify custom PyPI packages in the
config.software_config.pypi_packages
block:software_config { # ... pypi_packages = { numpy = "" scipy = ">=1.1.0" } }
Step 3: Install PyPI packages to the Cloud Composer 2 environment
After your Cloud Composer 2 environment is created, install custom PyPI packages to it.
Console
Go to the Environments page in the Google Cloud console:
Select your Cloud Composer 2 environment.
Go to the PyPI packages tab and click Edit.
Copy PyPI package requirements from your Cloud Composer 1 environment. Click Save and wait until the environment updates.
gcloud
Create a
requirements.txt
file with the list of custom PyPI packages:numpy scipy>=1.1.0
Update your environment and pass the
requirements.txt
file in the :to the--update-pypi-packages-from-file
command:gcloud composer environments update COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ --update-pypi-packages-from-file requirements.txt
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the Cloud Composer 2 environment is located.
Terraform
Skip this step. You already installed custom PyPI packages when you created the environment.
Step 4: Transfer variables and pools
Airflow supports exporting variables and pools to JSON files. You can then import these files to your Cloud Composer 2 environment.
Airflow CLI commands used in this step operate on local files in Airflow
workers. To upload or download the files, use the /data
folder in the
Cloud Storage bucket of your environment. This folder syncs to the
/home/airflow/gcs/data/
directory in Airflow workers. In the Airflow CLI
commands, specify /home/airflow/gcs/data/
in the FILEPATH
parameter.
gcloud
Export variables from your Cloud Composer 1 environment:
gcloud composer environments run \ COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ variables export -- /home/airflow/gcs/data/variables.json
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Export pools from your Cloud Composer 1 environment:
gcloud composer environments run \ COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ pools export -- /home/airflow/gcs/data/pools.json
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Get your Cloud Composer 2 environment's bucket URI.
Run the following command:
gcloud composer environments describe COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ --format="value(config.dagGcsPrefix)"
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the environment is located.
In the output, remove the
/dags
folder. The result is the URI of your Cloud Composer 2 environment's bucket.For example, change
gs://us-central1-example-916807e1-bucket/dags
togs://us-central1-example-916807e1-bucket
.
Transfer JSON files with variables and pools to your Cloud Composer 2 environment:
gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION \ --source=variables.json
gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION \ --source=pools.json
Replace:
COMPOSER_2_BUCKET
with the URI of your Cloud Composer 2 environment bucket, obtained on the previous step.COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
Import variables and pools to Cloud Composer 2:
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ variables import \ -- /home/airflow/gcs/data/variables.json
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ pools import \ -- /home/airflow/gcs/data/pools.json
Check that variables and pools are imported:
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ variables list
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ pools list
Remove JSON files from the buckets:
gcloud composer environments storage data delete \ variables.json \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
gcloud composer environments storage data delete \ pools.json \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
gcloud composer environments storage data delete \ variables.json \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
gcloud composer environments storage data delete \ pools.json \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
Step 5: Transfer other data from your Cloud Composer 1 environment's bucket
Transfer plugins and other data from your Cloud Composer 1 environment's bucket.
gcloud
Transfer plugins to your Cloud Composer 2 environment. To do so, export plugins from your Cloud Composer 1 environment's bucket to the
/plugins
folder in your Cloud Composer 2 environment's bucket:gcloud composer environments storage plugins export \ --destination=COMPOSER_2_BUCKET/plugins \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
Check that the
/plugins
folder is successfully imported:gcloud composer environments storage plugins list \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
Export the
/data
folder from your Cloud Composer 1 environment to the Airflow 2 environment:gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION
Check that the
/data
folder is successfully imported:gcloud composer environments storage data list \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION
Step 6: Transfer connections
This step explains how to transfer connections by creating them manually.
gcloud
To get a list of connections in your Cloud Composer 1 environment, run:
gcloud composer environments run \ COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ connections list
To create a new connection in your Cloud Composer 2 environment, run the
connections
Airflow CLI command throughgcloud
. For example:gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ connections add \ -- --conn-host postgres.example.com \ --conn-port 5432 \ --conn-type postgres \ --conn-login example_user \ --conn-password example_password \ --conn-description "Example connection" \ example_connection
Step 7: Transfer user accounts
This step explains how to transfer users by creating them manually.
Airflow UI
To view a list of users in your Cloud Composer 1 environment:
Open the Airflow web interface for your Cloud Composer 1 environment.
Go to Security > List Users.
To create a user in your Cloud Composer 2 environment:
Open the Airflow web interface for your Cloud Composer 2 environment.
Go to Security > List Users.
Click Add a new record.
gcloud
-
To view a list of users in your Cloud Composer 1 environment, run the
users list
Airflow CLI command throughgcloud
:gcloud composer environments run \ COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ users list
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.
To create a new user account in your Cloud Composer 2 environment, run the
users create
Airflow CLI command throughgcloud
. For example:gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ users create \ -- --username example_username \ --firstname Example-Name \ --lastname Example-Surname \ --email example-user@example.com \ --use-random-password \ --role Op
Replace:
COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_2_LOCATION
with the region where the Cloud Composer 2 environment is located.- All user configuration parameters with their values from your Cloud Composer 1 environment, including user's role.
Alternative way to transfer user accounts
As an alternative, you can use users export
and users import
Airflow
CLI commands.
Export user accounts to a file in your environment's bucket
/data
folder:gcloud composer environments run \ COMPOSER_1_ENV \ --location COMPOSER_1_LOCATION \ users export -- /home/airflow/gcs/data/users.json
Export this file to your Cloud Composer 2 environment's bucket:
gcloud composer environments storage data export \ --destination=COMPOSER_2_BUCKET/data \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION \ --source=users.json
Import user accounts from this file to your Cloud Composer 2 environment:
gcloud composer environments run \ COMPOSER_2_ENV \ --location COMPOSER_2_LOCATION \ users import \ -- /home/airflow/gcs/data/users.json
Delete the JSON files in both environments:
gcloud composer environments storage data delete \ --environment=COMPOSER_1_ENV \ --location=COMPOSER_1_LOCATION \ users.json
gcloud composer environments storage data delete \ --environment=COMPOSER_2_ENV \ --location=COMPOSER_2_LOCATION \ users.json
Replace:
COMPOSER_1_ENV
with the name of your Cloud Composer 1 environment.COMPOSER_2_ENV
with the name of your Cloud Composer 2 environment.COMPOSER_1_LOCATION
with the region where the Cloud Composer 1 environment is located.COMPOSER_2_LOCATION
with the region where the Cloud Composer 2 environment is located.COMPOSER_2_BUCKET
with the URI of your Cloud Composer 2 environment bucket, obtained on the previous step.
Step 8: Transfer DAGs to the Cloud Composer 2 environment
The following potential problems might happen when you transfer DAGs between environments:
If a DAG is enabled (not paused) in both environments, each environment runs its own copy of the DAG, as scheduled. This might lead to duplicate DAG runs for the same data and execution time.
Because of DAG catchup, Airflow schedules extra DAG runs, beginning from the start date specified in your DAGs. This happens because the new Airflow instance does not take into account the history of DAG runs from the Cloud Composer 1 environment. This might lead to a large number of DAG runs scheduled starting from the specified start date.
Prevent duplicate DAG runs
In your Cloud Composer 2 environment, In your Airflow 2 environment, add
an Airflow configuration option override for the
dags_are_paused_at_creation
option. After you make this change, all new DAGs
are paused by default.
Section | Key | Value |
---|---|---|
core |
dags_are_paused_at_creation |
True |
Prevent extra or missing DAG runs
To avoid gaps and overlaps in
execution dates disable
catch up in your Cloud Composer 2. In this way, after you upload DAGs to your
Cloud Composer 2 environment, Airflow does not schedule DAG runs that were
already run in the Cloud Composer 1 environment. Add
an Airflow configuration option override for the
catchup_by_default
option:
Section | Key | Value |
---|---|---|
scheduler |
catchup_by_default |
False |
Transfer your DAGs to the Cloud Composer 2 environment
To transfer your DAGs to the Cloud Composer 2 environment:
Upload the DAG from the Cloud Composer 1 environment to the Cloud Composer 2 environment. Skip the
airflow_monitoring.py
DAG.The DAGs are paused in the Cloud Composer 2 environment because of the configuration override, so no DAG runs are scheduled.
In the Airflow web interface, go to DAGs and check for reported DAG syntax errors.
At the time when you plan to transfer the DAG:
Pause the DAGs in your Cloud Composer 1 environment.
Un-pause the DAGs in your Cloud Composer 2 environment.
Check that the new DAG runs are scheduled at the correct time.
Wait for the DAG runs to happen in the Cloud Composer 2 environment and check if they were successful. If a DAG run was successful, do not unpause it in the Cloud Composer 1 environment; if you do so, a DAG run for the same time and date happens in your Cloud Composer 1 environment.
If a specific DAG runs fails, attempt to troubleshoot the DAG until it successfully runs in Cloud Composer 2.
If required, you can always fall back to the Cloud Composer 1 version of the DAG and execute DAG runs that failed in Cloud Composer 2 from your Cloud Composer 1 environment:
Pause the DAG in your Cloud Composer 2 environment.
Un-pause the DAG in your Cloud Composer 1 environment. This schedules catch up DAG runs for the time when the DAG was paused in Cloud Composer 1 environment.
Step 9: Monitor your Cloud Composer 2 environment
After you transfer all DAGs and configuration to the Cloud Composer 2 environment, monitor it for potential issues, failed DAG runs, and overall environment health. If the Cloud Composer 2 environment runs without problems for a sufficient period of time, consider deleting the Cloud Composer 1 environment.
What's next
- Troubleshooting DAGs
- Troubleshooting environment creation
- Troubleshooting environment updates
- Using backport packages