Stay organized with collections
Save and categorize content based on your preferences.
The Dataproc WorkflowTemplates API provides a
flexible and easy-to-use mechanism for managing and executing workflows. A
Workflow Template is a reusable workflow configuration. It defines a graph of
jobs with information on where to run those jobs.
If the workflow
uses a managed cluster, it creates the cluster, runs the jobs,
and then deletes the cluster when the jobs are finished.
If the workflow uses a
cluster selector, it runs jobs on a selected existing cluster.
Workflows are ideal for complex job flows. You can create job dependencies
so that a job starts only after its dependencies complete successfully.
When you create a workflow template Dataproc
does not create a cluster or submit jobs to a cluster.
Dataproc creates or selects a cluster and runs workflow jobs on
the cluster when a workflow template is instantiated.
Kinds of Workflow Templates
Managed cluster
A workflow template can specify a managed cluster. The workflow will create an
"ephemeral" cluster to run workflow jobs, and then delete the cluster when the
workflow is finished.
Cluster selector
A workflow template can specify an existing cluster on which to run workflow
jobs by specifying one or more user labels
previously attached to the cluster. The workflow will run on a
cluster that matches all of the labels. If multiple clusters match
all labels, Dataproc selects the cluster with the most
YARN available memory to run all workflow jobs. At the end of workflow,
Dataproc does not delete the selected cluster. See
Use cluster selectors with workflows
for more information.
Parameterized
If you will run a workflow template multiple times with different values, use
parameters to avoid editing the workflow template for each run:
define parameters in the template, then
pass different values for the parameters for each run.
Automation of repetitive tasks. Workflows encapsulate frequently used
cluster configurations and jobs.
Transactional fire-and-forget API interaction model. Workflow Templates
replace the steps involved in a typical flow, which include:
creating the cluster
submitting jobs
polling
deleting the cluster
Workflow Templates use a single token to track progress from cluster creation
to deletion, and automate error handling and recovery. They also simplify the
integration of Dataproc with other tools, such as Cloud Run functions
and Cloud Composer.
Support for ephemeral and long-lived clusters. A common complexity
associated with running Apache Hadoop is tuning and right-sizing clusters.
Ephemeral (managed) clusters are easier to configure since they run a
single workload. Cluster selectors can be used with
longer-lived clusters to repeatedly execute the same workload
without incurring the amortized cost of creating and deleting clusters.
Granular IAM security. Creating Dataproc clusters and
submitting jobs require all-or-nothing IAM permissions.
Workflow Templates use a per-template
workflowTemplates.instantiate
permission, and do not depend on cluster or job permissions.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eWorkflow Templates offer a reusable configuration for defining a series of jobs in a Directed Acyclic Graph (DAG), streamlining the management and execution of workflows.\u003c/p\u003e\n"],["\u003cp\u003eInstantiating a Workflow Template initiates a Workflow, which either creates an ephemeral cluster, runs the jobs, and then deletes the cluster, or utilizes a pre-existing cluster selected via labels.\u003c/p\u003e\n"],["\u003cp\u003eWorkflows are ideal for complex job sequences, allowing you to set job dependencies so that one job will only execute once the previous one has been completed successfully.\u003c/p\u003e\n"],["\u003cp\u003eWorkflow Templates can be parameterized to execute with varying values without the need to edit the template for each run, enhancing flexibility.\u003c/p\u003e\n"],["\u003cp\u003eWorkflow Templates simplify task automation and the integration of Dataproc with external tools by replacing manual cluster management steps with a single-token tracking process.\u003c/p\u003e\n"]]],[],null,["# Overview of Dataproc Workflow Templates\n\nThe Dataproc [WorkflowTemplates API](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates) provides a\nflexible and easy-to-use mechanism for managing and executing workflows. A\nWorkflow Template is a reusable workflow configuration. It defines a graph of\njobs with information on where to run those jobs.\n\n**Key Points:**\n\n- [Instantiating a Workflow Template](/dataproc/docs/concepts/workflows/using-workflows#running_a_workflow) launches a Workflow. A Workflow is an operation that runs a [Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of jobs on a cluster.\n - If the workflow uses a [managed cluster](#managed_cluster), it creates the cluster, runs the jobs, and then deletes the cluster when the jobs are finished.\n - If the workflow uses a [cluster selector](#cluster_selector), it runs jobs on a selected existing cluster.\n- Workflows are ideal for complex job flows. You can create job dependencies so that a job starts only after its dependencies complete successfully.\n- When you [create a workflow template](/dataproc/docs/concepts/workflows/using-workflows#creating_a_template) Dataproc does not create a cluster or submit jobs to a cluster. Dataproc creates or selects a cluster and runs workflow jobs on the cluster when a workflow template is **instantiated**.\n\nKinds of Workflow Templates\n---------------------------\n\n### Managed cluster\n\nA workflow template can specify a managed cluster. The workflow will create an\n\"ephemeral\" cluster to run workflow jobs, and then delete the cluster when the\nworkflow is finished.\n\n### Cluster selector\n\nA workflow template can specify an existing cluster on which to run workflow\njobs by specifying one or more [user labels](/dataproc/docs/concepts/labels)\npreviously attached to the cluster. The workflow will run on a\ncluster that matches all of the labels. If multiple clusters match\nall labels, Dataproc selects the cluster with the most\nYARN available memory to run all workflow jobs. At the end of workflow,\nDataproc does not delete the selected cluster. See\n[Use cluster selectors with workflows](/dataproc/docs/concepts/workflows/cluster-selectors)\nfor more information.\n| A workflow can select a specific cluster by matching the `goog-dataproc-cluster-name` label (see [Using Automatically Applied Labels](/dataproc/docs/concepts/workflows/cluster-selectors#using_automatically_applied_labels)).\n\n### Parameterized\n\nIf you will run a workflow template multiple times with different values, use\nparameters to avoid editing the workflow template for each run:\n\n1. define parameters in the template, then\n\n2. pass different values for the parameters for each run.\n\nSee\n[Parameterization of Workflow Templates](/dataproc/docs/concepts/workflows/workflow-parameters)\nfor more information.\n\n### Inline\n\nWorkflows can be instantiated inline using the `gcloud` command with\n[workflow template YAML files](/dataproc/docs/concepts/workflows/using-yamls#instantiate_a_workflow_using_a_yaml_file) or by calling the Dataproc\n[InstantiateInline](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates/instantiateInline)\nAPI (see [Using inline Dataproc workflows](/dataproc/docs/concepts/workflows/inline-workflows)).\nInline workflows do not create or modify workflow template resources.\n| Inline workflows can be useful for rapid prototyping or automation.\n\nWorkflow Template use cases\n---------------------------\n\n- **Automation of repetitive tasks.** Workflows encapsulate frequently used\n cluster configurations and jobs.\n\n- **Transactional fire-and-forget API interaction model.** Workflow Templates\n replace the steps involved in a typical flow, which include:\n\n 1. creating the cluster\n 2. submitting jobs\n 3. polling\n 4. deleting the cluster\n\n Workflow Templates use a single token to track progress from cluster creation\n to deletion, and automate error handling and recovery. They also simplify the\n integration of Dataproc with other tools, such as Cloud Run functions\n and Cloud Composer.\n- **Support for ephemeral and long-lived clusters.** A common complexity\n associated with running Apache Hadoop is tuning and right-sizing clusters.\n Ephemeral (managed) clusters are easier to configure since they run a\n single workload. Cluster selectors can be used with\n longer-lived clusters to repeatedly execute the same workload\n without incurring the amortized cost of creating and deleting clusters.\n\n- **Granular IAM security.** Creating Dataproc clusters and\n submitting jobs require all-or-nothing IAM permissions.\n Workflow Templates use a per-template\n [workflowTemplates.instantiate](/dataproc/docs/concepts/iam/iam#workflow_templates_methods_required_permissions)\n permission, and do not depend on cluster or job permissions."]]