Get started with Dataflow

The Dataflow service runs pipelines that are defined by the Apache Beam SDK. But for many use cases, you don't need to write code with the SDK, because Dataflow provides several no-code and low-code options.

  • Templates. Dataflow provides prebuilt templates for moving data from one product to another. For example, you can use a template to move data from Pub/Sub to BigQuery.

  • Job builder. The job builder is a visual UI for building Dataflow pipelines in the Google Cloud console. It supports a subset of Apache Beam sources and sinks, as well as transforms such as joins, Python functions, and SQL queries. We recommend the job builder for simple use cases such as data movement.

  • Turnkey transforms for ML. For machine learning (ML) pipelines, Dataflow provides turnkey transforms that require minimal code to configure. As a starting point, run an example ML notebook in Google Colab. To learn more, see the Dataflow ML overview.

  • Apache Beam SDK. To get the full power of Apache Beam, use the SDK to write a custom pipeline in Python, Java, or Go.

To help your decision, the following table lists some common examples.

I want to ... Recommended approach
Move data from a source to a sink, with no custom logic.

Job builder or template

We recommend starting with the job builder. If the job builder doesn't support your use case, see if there is a template for it.

Move data from a source to a sink, and apply custom logic using Python functions or SQL. Job builder
Use an ML model in my pipeline or prepare my data for training or inference. Dataflow ML turnkey transforms
Write a pipeline that requires more advanced Apache Beam features. Apache Beam SDK for Java, Python, or Go

What's next