Stay organized with collections
Save and categorize content based on your preferences.
You can define a workflow template in a YAML file, then instantiate the template
to run the workflow. You can also import and export a workflow template YAML
file to create and update a Dataproc workflow template resource.
Define your workflow template in a YAML file. The YAML file must include all
required
WorkflowTemplate
fields except the id field, and it must also exclude
the version field and all output-only fields.
In the following workflow example, the prerequisiteStepIds list in
the terasort step ensures the terasort
step will only begin after the teragen step completes
successfully.
Instantiate a workflow using a YAML file with Dataproc Auto Zone Placement
Define your workflow template in a YAML file. This YAML file is the same as the
previous YAML file, except the zoneUri field is set to the empty string ('')
to allow Dataproc
Auto Zone Placement
to select the zone for the cluster.
You can import and export workflow template YAML files. Typically, a workflow
template is first exported as a YAML file, then the YAML is edited, and then
the edited YAML file is imported to update the template.
Export the workflow template
to a YAML file. During the export operation,
the id and version fields, and all output-only fields
are filtered from the output and do not appear in the
exported YAML file.
You can pass either the
WorkflowTemplateid or the fully qualified template resource name
("projects/PROJECT_ID/regions/REGION/workflowTemplates/TEMPLATE_ID") to the command.
Edit the YAML file locally. Note that the id, version,
and output-only fields, which were filtered
from the YAML file when the template was exported, are disallowed in the
imported YAML file.
You can pass either the
WorkflowTemplateid or the fully qualified template resource name
("projects/PROJECT_ID/regions/region/workflowTemplates/TEMPLATE_ID") to the command. The template resource with the same template name will be overwritten (updated)
and its version number will be incremented. If a template with the same template
name does not exist, it will be created.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eYou can define workflow templates in YAML files and then instantiate them to run workflows, allowing for efficient workflow management.\u003c/p\u003e\n"],["\u003cp\u003eWorkflows can be run directly from a YAML file without creating a workflow template resource by using the \u003ccode\u003egcloud dataproc workflow-templates instantiate-from-file\u003c/code\u003e command.\u003c/p\u003e\n"],["\u003cp\u003eWhen defining a workflow template, you can set \u003ccode\u003eprerequisiteStepIds\u003c/code\u003e to specify dependencies between steps, ensuring they run in the correct order.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Auto Zone Placement can be used by setting the \u003ccode\u003ezoneUri\u003c/code\u003e field to an empty string in the workflow template YAML file, simplifying cluster zone selection.\u003c/p\u003e\n"],["\u003cp\u003eWorkflow templates can be exported to YAML files, edited locally, and then imported to update existing templates using \u003ccode\u003egcloud dataproc workflow-templates export\u003c/code\u003e and \u003ccode\u003egcloud dataproc workflow-templates import\u003c/code\u003e commands.\u003c/p\u003e\n"]]],[],null,["# Use YAML files with workflows\n\nYou can define a workflow template in a YAML file, then instantiate the template\nto run the workflow. You can also import and export a workflow template YAML\nfile to create and update a Dataproc workflow template resource.\n| Also see [Using inline Dataproc workflows](/dataproc/docs/concepts/workflows/inline-workflows) for other ways to run a workflow without creating a workflow template resource.\n\n### Run a workflow using a YAML file\n\nTo run a workflow without first creating a workflow template resource,\nuse the\n[gcloud dataproc workflow-templates instantiate-from-file](/sdk/gcloud/reference/dataproc/workflow-templates/instantiate-from-file)\ncommand.\n\n1. Define your workflow template in a YAML file. The YAML file must include all required [WorkflowTemplate](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates) fields except the `id` field, and it must also exclude the `version` field and all output-only fields. In the following workflow example, the `prerequisiteStepIds` list in the `terasort` step ensures the `terasort` step will only begin after the `teragen` step completes successfully. \n\n ```\n jobs:\n - hadoopJob:\n args:\n - teragen\n - '1000'\n - hdfs:///gen/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: teragen\n - hadoopJob:\n args:\n - terasort\n - hdfs:///gen/\n - hdfs:///sort/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: terasort\n prerequisiteStepIds:\n - teragen\n placement:\n managedCluster:\n clusterName: my-managed-cluster\n config:\n gceClusterConfig:\n zoneUri: us-central1-a\n ```\n2. Run the workflow: \n\n ```\n gcloud dataproc workflow-templates instantiate-from-file \\\n --file=TEMPLATE_YAML \\\n --region=REGION\n ```\n\n### Instantiate a workflow using a YAML file with Dataproc Auto Zone Placement\n\n1. Define your workflow template in a YAML file. This YAML file is the same as the previous YAML file, except the `zoneUri` field is set to the empty string ('') to allow Dataproc [Auto Zone Placement](/dataproc/docs/concepts/configuring-clusters/auto-zone) to select the zone for the cluster. \n\n ```\n jobs:\n - hadoopJob:\n args:\n - teragen\n - '1000'\n - hdfs:///gen/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: teragen\n - hadoopJob:\n args:\n - terasort\n - hdfs:///gen/\n - hdfs:///sort/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: terasort\n prerequisiteStepIds:\n - teragen\n placement:\n managedCluster:\n clusterName: my-managed-cluster\n config:\n gceClusterConfig:\n zoneUri: ''\n ```\n2. Run the workflow. When using Auto Placement, you must pass a [region](/dataproc/docs/concepts/regional-endpoints) to the `gcloud` command. \n\n ```\n gcloud dataproc workflow-templates instantiate-from-file \\\n --file=TEMPLATE_YAML \\\n --region=REGION\n ```\n\n### Import and export a workflow template YAML file\n\nYou can import and export workflow template YAML files. Typically, a workflow\ntemplate is first exported as a YAML file, then the YAML is edited, and then\nthe edited YAML file is imported to update the template.\n\n1. [Export the workflow template](/sdk/gcloud/reference/dataproc/workflow-templates/export)\n to a YAML file. During the export operation,\n the `id` and `version` fields, and all output-only fields\n are filtered from the output and do not appear in the\n exported YAML file.\n\n ```\n gcloud dataproc workflow-templates export TEMPLATE_ID or TEMPLATE_NAME \\\n --destination=TEMPLATE_YAML \\\n --region=REGION\n ```\n You can pass either the [WorkflowTemplate](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates#resource-workflowtemplate) `id` or the fully qualified template resource `name` (\"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/regions/\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e/workflowTemplates/\u003cvar translate=\"no\"\u003eTEMPLATE_ID\u003c/var\u003e\") to the command. If you omit the `--destination` flag, the output is directed to `stdout`, so the following command will also export the template to a YAML file: \n |\n | ```\n | gcloud dataproc workflow-templates export TEMPLATE_ID or TEMPLATE_NAME \\\n | --region=REGION \u003e TEMPLATE_YAML\n |\n | ```\n\n \u003cbr /\u003e\n\n2. Edit the YAML file locally. Note that the `id`, `version`,\n and output-only fields, which were filtered\n from the YAML file when the template was exported, are disallowed in the\n imported YAML file.\n\n3. [Import the updated workflow template](/sdk/gcloud/reference/dataproc/workflow-templates/import)\n YAML file:\n\n ```\n gcloud dataproc workflow-templates import TEMPLATE_ID or TEMPLATE_NAME \\\n --source=TEMPLATE_YAML \\\n --region=REGION\n ```\n You can pass either the [WorkflowTemplate](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates#resource-workflowtemplate) `id` or the fully qualified template resource `name` (\"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/regions/\u003cvar translate=\"no\"\u003eregion\u003c/var\u003e/workflowTemplates/\u003cvar translate=\"no\"\u003eTEMPLATE_ID\u003c/var\u003e\") to the command. The template resource with the same template name will be overwritten (updated) and its version number will be incremented. If a template with the same template name does not exist, it will be created.\n\n \u003cbr /\u003e"]]