The Dataflow service runs pipelines that are defined by the Apache Beam SDK. But for many use cases, you don't need to write code with the SDK, because Dataflow provides several no-code and low-code options.
For moving data from one product to another, Dataflow provides prebuilt templates that you can run from the Google Cloud console or the command line. For example, you can use a template to move data from Pub/Sub to BigQuery.
For simple use cases, if there isn't a template for your use case, or you want to make small customizations, we recommend the job builder. The job builder is a visual UI for building Dataflow pipelines in the Google Cloud console. Select your source and sink, and add transformations such as joins, Python functions, and SQL transformations. Note that the job builder supports a subset of sources and sinks. If your source or sink isn't supported, use the Apache Beam SDK.
For machine learning (ML) pipelines, Dataflow provides turnkey transforms that require minimal code to configure. As a starting point, run an example ML notebook in Google Colab. To learn more, see the Dataflow ML overview.
To get the full power of Apache Beam, use the SDK to write a custom pipeline in Python, Java, or Go.
To help your decision, the following table lists some common examples.
I want to ... | Recommended approach |
---|---|
Move data from a source to a sink, and apply custom logic using Python functions or SQL. We recommend this option if you don't have this use case presently, but might in the future. |
Job builder |
Move data from a source to a sink, with no custom logic. | Template |
I want to use an ML model in my pipeline or prepare my data for training or inference. | Dataflow ML turnkey transforms |
Write a pipeline in Java, Python, or Go that requires more advanced Apache Beam features. | Apache Beam SDK |
What's next
- Get started with a specific Dataflow use case and approach:
- See more Dataflow use cases.