Cloud Composer overview

Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3

This page provides a brief introduction to Airflow and DAGs, and describes the features and capabilities of Cloud Composer.

For more information about new features in Cloud Composer releases, see Release notes.

About Cloud Composer

Cloud Composer is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and manage workflow pipelines that span across clouds and on-premises data centers.

Cloud Composer is built on the popular Apache Airflow open source project and operates using the Python programming language.

By using Cloud Composer instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead. Cloud Composer helps you create managed Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, so you can focus on your workflows and not your infrastructure.

Differences between Cloud Composer versions

For more information about differences between major versions of Cloud Composer, see Cloud Composer versioning overview.

Airflow and Airflow DAGs (workflows)

In data analytics, a workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using DAGs, or "Directed Acyclic Graphs".

Relationship between DAGs and tasks
Figure 1. Relationship between DAGs and tasks

A DAG is a collection of tasks that you want to schedule and run, organized in a way that reflects their relationships and dependencies. DAGs are created in Python files, which define the DAG structure using code. The DAG's purpose is to ensure that each task is executed at the right time and in the right order.

Each task in a DAG can represent almost anything—for example, one task might perform any of the following functions:

  • Preparing data for ingestion
  • Monitoring an API
  • Sending an email
  • Running a pipeline

In addition to running a DAG on a schedule, you can trigger DAGs manually or in response to events, such as changes in a Cloud Storage bucket. For more information, see Triggering DAGs.

For more information about DAGs and tasks, see the Apache Airflow documentation.

Cloud Composer environments

Cloud Composer environments are self-contained Airflow deployments based on Google Kubernetes Engine. They work with other Google Cloud services using connectors built into Airflow. You can create one or more environments in a single Google Cloud project, in any supported region.

Cloud Composer provisions Google Cloud services that run your workflows and all Airflow components. The main components of an environment are:

  • GKE cluster: Airflow components such as Airflow schedulers, triggerers, and workers run as GKE workloads in a single cluster created for your environment, and are responsible for processing and executing DAGs.

    The cluster also hosts other Cloud Composer components like Composer Agent and Airflow Monitoring, which help manage the Cloud Composer environment, gather logs to store in Cloud Logging, and gather metrics to upload to Cloud Monitoring.

  • Airflow web server: The web server runs the Apache Airflow UI.

  • Airflow database: The database holds the Apache Airflow metadata.

  • Cloud Storage bucket: Cloud Composer associates a Cloud Storage bucket with your environment. This bucket, also called environment's bucket, stores the DAGs, logs, custom plugins, and data for the environment. For more information about the environment's bucket see Data stored in Cloud Storage.

For an in-depth look at the components of an environment, see Environment architecture.

Cloud Composer interfaces

Cloud Composer provides interfaces for managing environments, Airflow instances that run within environments, and individual DAGs.

For example, you can create and configure Cloud Composer environments in Google Cloud console, Google Cloud CLI, Cloud Composer API, or Terraform.

As another example, you can manage DAGs from Google Cloud console, native Airflow UI, or by running Google Cloud CLI and Airflow CLI commands.

Airflow features in Cloud Composer

When using Cloud Composer, you can manage and use Airflow features such as:

Access control in Cloud Composer

You manage security at the Google Cloud project level and can assign IAM roles that allow individual users to modify or create environments. If someone does not have access to your project or does not have an appropriate Cloud Composer IAM role, that person cannot access any of your environments.

In addition to IAM, you can use Airflow UI access control, which is based on the Apache Airflow Access Control model.

For more information about security features in Cloud Composer, see Cloud Composer security overview.

Environment networking

Cloud Composer supports several networking configurations for environments, with many configuration options. For example, in a Private IP environment, DAGs and Airflow components are fully isolated from the public internet.

For more information about networking in Cloud Composer, see pages for individual networking features:

Other features of Cloud Composer

Other Cloud Composer features include:

Frequently Asked Questions

What version of Apache Airflow does Cloud Composer use?

Cloud Composer environments are based on Cloud Composer images. When you create an environment, you can select an image with a specific Airflow version:

  • Cloud Composer 3 supports Airflow 2.
  • Cloud Composer 2 supports Airflow 2.
  • Cloud Composer 1 supports Airflow 1 and Airflow 2.

You have control over the Apache Airflow version of your environment. You can decide to upgrade your environment to a later version of Cloud Composer image. Each Cloud Composer release supports several Apache Airflow versions.

Can I use native Airflow UI and CLI?

You can access the Apache Airflow web interface of your environment. Each of your environments has its own Airflow UI. For more information about accessing the Airflow UI, see Airflow web interface.

To run Airflow CLI commands in your environments, use gcloud commands. For more information about running Airflow CLI commands in Cloud Composer environments, see Airflow command-line interface.

Can I use my own database as the Airflow database?

Cloud Composer uses a managed database service for the Airflow database. It is not possible to use a user-provided database as the Airflow database.

Can I use my own cluster as a Cloud Composer cluster?

Cloud Composer uses Google Kubernetes Engine service to create, manage and delete environment clusters where Airflow components run. These clusters are fully managed by Cloud Composer.

It is not possible to build a Cloud Composer environment based on a self-managed Google Kubernetes Engine cluster.

Can I use my own container registry?

Cloud Composer uses Artifact Registry service to manage container image repositories used by Cloud Composer environments. It is not possible to replace it with a user-provided container registry.

Are Cloud Composer environments zonal or regional?

Cloud Composer 3 and Cloud Composer 2 environments have a zonal Airflow database and a regional Airflow scheduling and execution layer. Airflow schedulers, workers and web servers run in the Airflow execution layer.

Cloud Composer 1 environments are zonal.

What's next