Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
This page explains how to transfer DAGs, data and configuration from your existing Airflow 1.10.* environments to environments with Airflow 2 and later Airflow versions.
Other migration guides
From | To | Method | Guide |
---|---|---|---|
Cloud Composer 1, Airflow 2 | Cloud Composer 2, Airflow 2 | Side-by-side, using snapshots | Migration guide (snapshots) |
Cloud Composer 1, Airflow 1 | Cloud Composer 2, Airflow 2 | Side-by-side, using snapshots | Migration guide (snapshots) |
Cloud Composer 1, Airflow 2 | Cloud Composer 2, Airflow 2 | Side-by-side, manual transfer | Manual migration guide |
Cloud Composer 1, Airflow 1 | Cloud Composer 2, Airflow 2 | Side-by-side, manual transfer | Manual migration guide |
Airflow 1 | Airflow 2 | Side-by-side, manual transfer | This guide (manual migration) |
Side-by-side upgrades
Cloud Composer provides the Cloud Composer database transfer script to migrate the metadata database, DAGs, data and plugins from Cloud Composer environments with Airflow 1.10.14 and Airflow 1.10.15 to existing Cloud Composer environments with Airflow 2.0.1 and later Airflow versions.
This is an alternate path to the one that is described in this guide. Some parts of this guide still apply when using the provided script. For example, you might want to check your DAGs for compatibility with Airflow 2 before migrating them, or to make sure that concurrent DAG runs do not happen, and there are no extra or missing DAG runs.
Before you begin
Before you start using Cloud Composer environments with Airflow 2, consider changes that Airflow 2 brings to Cloud Composer environments.
Scheduler HA
You can use more than one Airflow scheduler in your environment. You can set the number of schedulers when you create an environment, or by updating an existing environment.
Celery+Kubernetes Executor
Airflow 2 Celery+Kubernetes Executor is supported in Cloud Composer 3.
Breaking changes
Airflow 2 introduces many major changes some of which are breaking:
- Existing DAGs from Airflow 1.10.* are not guaranteed to work with Airflow 2. They need to be tested and possibly adjusted.
- Operators, transfers, hooks migrated to providers packages. Import statements in DAGs must use new provider packages. Old import statements might no longer work in Airflow 2.
- Some Airflow 1.10.* configurations may no longer be supported because Airflow 2 no longer supports certain configuration options.
- Some custom PyPI packages might be incompatible with the new version of Airflow or Python.
- Airflow UI with Access Control is the default Airflow 2 UI. Airflow 2 does not support other Airflow UI types.
- The experimental REST API is replaced by the stable Airflow API. The experimental REST API is disabled by default in Airflow 2.
- Other major changes in Airflow 2.0.0
- Other major changes in Airflow 2.0.1
Differences between environments with Airflow 2 and Airflow 1.10.*
Major differences between Cloud Composer environments with Airflow 1.10.* and environments with Airflow 2:
- Environments with Airflow 2 use Python 3.8. This is a newer version than the one used in Airflow 1.10.* environments. Python 2, Python 3.6, Python 3.7 are not supported.
- Airflow 2 uses a different CLI format. Cloud Composer supports the new format in environments with Airflow 2 through the
gcloud composer environments run
command. - Preinstalled PyPI packages are different in Airflow 2 environments. For a list of preinstalled PyPI packages, see Cloud Composer version list.
- DAG serialization is always enabled in Airflow 2. As a result, asynchronous DAG loading is no longer needed, and it is not supported in Airflow 2. As a result of it, configuring
[core]store_serialized_dags
and[core]store_dag_code
parameters is not supported for Airflow 2, and attempts to setting them will be reported as errors. - Airflow web server plugins are not supported. This doesn't impact scheduler or worker plugins, including Airflow operators and sensors.
- In Airflow 2 environments, the default Airflow user role is
Op
. For environments with Airflow 1.10.*, the default role isAdmin
.
Step 1: Check compatibility with Airflow 2
To check for potential conflicts with Airflow 2, consult the Upgrading to Airflow 2.0+ guide, in the section about upgrading DAGs.
One common issue that you might encounter is related to incompatible import paths. For more information about solving this compatibility issue, in the Upgrading to Airflow 2.0+ guide, see the section about backport providers.
Step 2: Create an Airflow 2 environment, transfer configuration overrides and environment variables
Create an Airflow 2 environment and transfer configuration overrides and environment variables:
Follow the steps for creating an environment. Before you create an environment, also specify configuration overrides and environment variables, as explained further.
When you select an image, choose an image with Airflow 2.
Manually transfer configuration parameters from your Airflow 1.10.* environment to the new Airflow 2 environment.
Console
When you create an environment, expand the Networking, Airflow config overrides, and additional features section.
Under Airflow configuration overrides, click Add Airflow configuration override.
Copy all configuration overrides from your Airflow 1.10.* environment.
Some configuration options use a different name and section in Airflow 2. For more information, see Configuration changes.
Under Environment variables, click Add environment variable
Copy all environment variables from your Airflow 1.10.* environment.
Click Create to create an environment.
Step 3: Install PyPI packages to the Airflow 2 environment
After your Airflow 2 environment is created, install PyPI packages to it:
Console
In the Google Cloud console, go to the Environments page.
Select your Airflow 2 environment.
Go to the PyPI packages tab and click Edit.
Copy PyPI package requirements from your Airflow 1.10.* environment. Click Save and wait until the environment updates.
Because Airflow 2 environments use a different set of preinstalled packages and a different Python version, you might encounter PyPI package conflicts that are difficult to resolve. One way to diagnose package dependency issues is to check for PyPI package errors by installing packages in an Airflow worker pod.
Step 4: Transfer variables and pools to Airflow 2
Airflow 1.10.* supports exporting variables and pools to JSON files. You can then import these files to your Airflow 2 environment.
You only need to transfer pools if you have custom pools other than
default_pool
. Otherwise, skip commands that export and import pools.
gcloud
Export variables from your Airflow 1.10.* environment:
gcloud composer environments run AIRFLOW_1_ENV \ --location AIRFLOW_1_LOCATION \ variables -- -e /home/airflow/gcs/data/variables.json
Replace:
AIRFLOW_1_ENV
with the name of your Airflow 1.10.* environment.AIRFLOW_1_LOCATION
with the region where the environment is located.
Export pools from your Airflow 1.10.* environment:
gcloud composer environments run AIRFLOW_1_ENV \ --location AIRFLOW_1_LOCATION \ pool -- -e /home/airflow/gcs/data/pools.json
Get your Airflow 2 environment bucket URI.
Run the following command:
gcloud composer environments describe AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ --format="value(config.dagGcsPrefix)"
Replace:
AIRFLOW_2_ENV
with the name of your Airflow 2 environment.AIRFLOW_2_LOCATION
with the region where the environment is located.
In the output, remove the
/dags
folder. The result is the URI of your Airflow 2 environment bucket.For example, change
gs://us-central1-example-916807e1-bucket/dags
togs://us-central1-example-916807e1-bucket
.
Transfer JSON files with variables and pools to your Airflow 2 environment:
gcloud composer environments storage data export \ --destination=AIRFLOW_2_BUCKET/data \ --environment=AIRFLOW_1_ENV \ --location=AIRFLOW_1_LOCATION \ --source=variables.json gcloud composer environments storage data export \ --destination=AIRFLOW_2_BUCKET/data \ --environment=AIRFLOW_1_ENV \ --location=AIRFLOW_1_LOCATION \ --source=pools.json
Replace
AIRFLOW_2_BUCKET
with the URI of your Airflow 2 environment bucket, obtained on the previous step.Import variables and pools to Airflow 2:
gcloud composer environments run \ AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ variables import \ -- /home/airflow/gcs/data/variables.json gcloud composer environments run \ AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ pools import \ -- /home/airflow/gcs/data/pools.json
Check that variables and pools are imported:
gcloud composer environments run \ AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ variables list gcloud composer environments run \ AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ pools list
Remove JSON files from the buckets:
gcloud composer environments storage data delete \ variables.json \ --environment=AIRFLOW_2_ENV \ --location=AIRFLOW_2_LOCATION gcloud composer environments storage data delete \ pools.json \ --environment=AIRFLOW_2_ENV \ --location=AIRFLOW_2_LOCATION gcloud composer environments storage data delete \ variables.json \ --environment=AIRFLOW_1_ENV \ --location=AIRFLOW_1_LOCATION gcloud composer environments storage data delete \ pools.json \ --environment=AIRFLOW_1_ENV \ --location=AIRFLOW_1_LOCATION
Step 5: Transfer other data from your Airflow 1.10.* environment bucket
gcloud
Transfer plugins to your Airflow 2 environment. To do so, export plugins from your Airflow 1.10.* environment bucket to the
/plugins
folder in your Airflow 2 environment bucket:gcloud composer environments storage plugins export \ --destination=AIRFLOW_2_BUCKET/plugins \ --environment=AIRFLOW_1_ENV \ --location=AIRFLOW_1_LOCATION
Check that the
/plugins
folder is successfully imported:gcloud composer environments storage plugins list \ --environment=AIRFLOW_2_ENV \ --location=AIRFLOW_2_LOCATION
Export the
/data
folder from your Airflow 1.10.* environment to the Airflow 2 environment:gcloud composer environments storage data export \ --destination=AIRFLOW_2_BUCKET/data \ --environment=AIRFLOW_1_ENV \ --location=AIRFLOW_1_LOCATION
Check that the
/data
folder is successfully imported:gcloud composer environments storage data list \ --environment=AIRFLOW_2_ENV \ --location=AIRFLOW_2_LOCATION
Step 6: Transfer connections and users
Airflow 1.10.* does not support exporting users and connections. To transfer users and connections, manually create new user accounts and connections in your Airflow 2 environment.
gcloud
To get a list of connections in your Airflow 1.10.* environment, run:
gcloud composer environments run AIRFLOW_1_ENV \ --location AIRFLOW_1_LOCATION \ connections -- --list
To create a new connection in your Airflow 2 environment, run the
connections
Airflow CLI command through gcloud. For example:gcloud composer environments run \ AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ connections add \ -- --conn-host postgres.example.com \ --conn-port 5432 \ --conn-type postgres \ --conn-login example_user \ --conn-password example_password \ --conn-description "Example connection" \ example_connection
To view a list of users in your Airflow 1.10.* environment:
Open the Airflow web interface for your Airflow 1.10.* environment.
Go to Admin > Users.
To create a new user account in your Airflow 2 environment, run the
users create
Airflow CLI command through gcloud. For example:gcloud composer environments run \ AIRFLOW_2_ENV \ --location AIRFLOW_2_LOCATION \ users create \ -- --username example_username \ --firstname Example-Name \ --lastname Example-Surname \ --email example-user@example.com \ --use-random-password \ --role Admin
Step 7: Make sure that your DAGs are ready for Airflow 2
Before transferring DAGs to your Airflow 2 environment, make sure that:
Your DAGs run successfully and there are no remaining compatibility issues.
Your DAGs use correct import statements.
For example, the new import statement for
BigQueryCreateDataTransferOperator
can look like this:from airflow.providers.google.cloud.operators.bigquery_dts \ import BigQueryCreateDataTransferOperator
Your DAGs are upgraded for Airflow 2. This change is compatible with Airflow 1.10.14 and later versions.
Step 8: Transfer DAGs to the Airflow 2 environment
The following potential problems might happen when you transfer DAGs between environments:
If a DAG is enabled (not paused) in both environments, each environment runs its own copy of the DAG, as scheduled. This might lead to concurrent DAG runs for the same data and execution time.
Because of DAG catchup, Airflow schedules extra DAG runs, beginning from the start date specified in your DAGs. This happens because the new Airflow instance does not take into account the history of DAG runs from the 1.10.* environment. This might lead to a large number of DAG runs scheduled starting from the specified start date.
Prevent concurrent DAG runs
In your Airflow 2 environment,
override the dags_are_paused_at_creation
Airflow configuration option. After you make this change, all new DAGs are
paused by default.
Section | Key | Value |
---|---|---|
core |
dags_are_paused_at_creation |
True |
Prevent extra or missing DAG runs
Specify a new static start date in DAGs that you transfer to your Airflow 2 environment.
To avoid gaps and overlaps in execution dates, the first DAG run should happen in the Airflow 2 environment at the next occurrence of the schedule interval. To do so, set the new start date in your DAG to be before the date of the last run in the Airflow 1.10.* environment.
As an example, if your DAG runs at 15:00, 17:00 and 21:00 every day in the Airflow 1.10.* environment, the last DAG run happened at 15:00, and you plan to transfer the DAG at 15:15, then the start date for the Airflow 2 environment can be today at 14:45. After you enable the DAG in the Airflow 2 environment, Airflow schedules a DAG run for 17:00.
As another example, if your DAG runs at 00:00 every day in the Airflow 1.10.* environment, the last DAG run happened at 00:00 on 26 April, 2021, and you plan to transfer the DAG at 13:00 on 26 April, 2021, then the start date for the Airflow 2 environment can be 23:45 on 25 April, 2021. After you enable the DAG in the Airflow 2 environment, Airflow schedules a DAG run for 00:00 on 27 April, 2021.
Transfer your DAGs one by one to the Airflow 2 environment
For each DAG, follow this procedure to transfer it:
Make sure that the new start date in the DAG is set as described in the previous section.
Upload the updated DAG to the Airflow 2 environment. This DAG is paused in the Airflow 2 environment because of the configuration override, so no DAG runs are scheduled yet.
In the Airflow web interface, go to DAGs and check for reported DAG syntax errors.
At the time when you plan to transfer the DAG:
Pause the DAG in your Airflow 1.10.* environment.
Un-pause the DAG in your Airflow 2 environment.
Check that the new DAG run is scheduled at the correct time.
Wait for the DAG run to happen in the Airflow 2 environment and check if the run is successful.
Depending on whether the DAG run is successful:
If the DAG run is successful, you can proceed and use the DAG from your Airflow 2 environment. Eventually, consider deleting the Airflow 1.10.* version of the DAG.
If the DAG run failed, attempt to troubleshoot the DAG until it successfully runs in Airflow 2.
If required, you can always fall back to the Airflow 1.10.* version of the DAG:
Pause the DAG in your Airflow 2 environment.
Un-pause the DAG in your Airflow 1.10.* environment. This schedules a new DAG run for the same date and time as the failed DAG run.
When you are ready to continue with the Airflow 2 version of the DAG, adjust the start date, upload the new version of the DAG to your Airflow 2 environment, and repeat the procedure.
Step 9: Monitor your Airflow 2 environment
After you transfer all DAGs and configuration to the Airflow 2 environment, monitor it for potential issues, failed DAG runs, and overall environment health. If the Airflow 2 environment runs without problems for a sufficient period of time, you can remove the Airflow 1.10.* environment.
What's next
- Troubleshooting DAGs
- Troubleshooting environment creation
- Troubleshooting environment updates
- Using backport packages