JOB_NAME: a job name of your choice
PROJECT_ID: your template project ID
REGION_NAME: region in which to run the job
INPUT_ASSET_OR_ENTITIES_LIST: path to your JDBC drivers
OUTPUT_FILE_FORMAT: your output file format in Cloud Storage
OUTPUT_ASSET: your Dataplex Universal Catalog output asset ID
PROJECT_ID: your template project ID
REGION_NAME: region in which to run the job
JOB_NAME: a job name of your choice
INPUT_ASSET_OR_ENTITIES_LIST: path to your JDBC drivers
OUTPUT_FILE_FORMAT: your output file format in Cloud Storage
OUTPUT_ASSET: your Dataplex Universal Catalog output asset ID
JOB_NAME: a job name of your choice
PROJECT_ID: your template project ID
REGION_NAME: region in which to run the job
SOURCE_ASSET_NAME_OR_DATASET_ID: your Dataplex Universal Catalog asset
name for the source BigQuery dataset, or the dataset ID
DESTINATION_ASSET_NAME: your Dataplex Universal Catalog asset name for
the destination Cloud Storage bucket
PROJECT_ID: your template project ID
REGION_NAME: region in which to run the job
JOB_NAME: a job name of your choice
SOURCE_ASSET_NAME_OR_DATASET_ID: your Dataplex Universal Catalog asset
name for the source BigQuery dataset, or the dataset ID
DESTINATION_ASSET_NAME: your Dataplex Universal Catalog asset name for
the destination Cloud Storage bucket
REGION_NAME: region in which to run the job
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-19。"],[[["\u003cp\u003eDataplex utilizes Dataflow-powered templates to facilitate common data processing tasks, such as data ingestion, processing, and lifecycle management.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataplex file format conversion template allows users to convert data stored in CSV or JSON formats within a Dataplex asset to Parquet or Avro format in another asset, with partition layout preservation and file compression support.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataplex BigQuery to Cloud Storage template enables the transfer of data from a BigQuery asset to a Cloud Storage asset, offering options for specifying tables, filtering by modification date, choosing file format, and handling existing files.\u003c/p\u003e\n"],["\u003cp\u003eDataplex enables the scheduling and monitoring of both Google Cloud-provided and custom Dataflow templates via its console, providing a centralized location for managing data pipelines.\u003c/p\u003e\n"],["\u003cp\u003eDataplex templates use Data pipelines to schedule tasks, and these tasks are visible within the Google Cloud console on the Dataplex page.\u003c/p\u003e\n"]]],[],null,["# Process data using templates\n\nDataplex Universal Catalog provides templates, powered by Dataflow,\nto perform common data processing tasks like data ingestion, processing, and\nmanaging the data lifecycle. This guide describes how to configure and run data\nprocessing templates.\n\nBefore you begin\n----------------\n\nDataplex Universal Catalog templates are powered by Dataflow.\nBefore you use templates, enable the Dataflow APIs.\n\n[Enable the Dataflow APIs](https://console.cloud.google.com/apis/api/dataflow.googleapis.com/overview)\n\nNote the following:\n\n- All templates support common\n [Dataflow pipeline options](/dataflow/docs/reference/pipeline-options).\n\n- Dataplex Universal Catalog uses [data pipelines](/dataflow/docs/guides/data-pipelines)\n to schedule the tasks defined by the templates.\n\n- You can only see tasks that you schedule through Dataplex Universal Catalog in\n the Google Cloud console on the **Dataplex Universal Catalog** page.\n\nTemplate: Convert raw data to curated data\n------------------------------------------\n\nThe Dataplex Universal Catalog file format conversion template converts data in a\nDataplex Universal Catalog Cloud Storage asset, or a list of\nDataplex Universal Catalog entities stored in CSV or JSON formats, to Parquet or\nAvro format-data in another Dataplex Universal Catalog asset. The partition layout\nis preserved in the conversion. It also supports compression of the output files.\n\n### Template parameters\n\n### Run the template\n\n### Console\n\n1. In the Google Cloud console, go to the **Dataplex Universal Catalog** page.\n\n [Go to Dataplex Universal Catalog](https://console.cloud.google.com/dataplex/lakes)\n2. Navigate to the **Process** view.\n\n3. Click **Create task**.\n\n4. Under **Convert to Curated Formats** , click **Create task**.\n\n5. Choose a Dataplex Universal Catalog lake.\n\n6. Provide a task name.\n\n7. Choose a region for task execution.\n\n8. Fill in the required parameters.\n\n9. Click **Continue**.\n\n### gcloud\n\nIn your shell or terminal, run the template: \n\n```\ngcloud beta dataflow flex-template run JOB_NAME \\\n--project=PROJECT_ID \\\n--region=REGION_NAME \\\n--template-file-gcs-location=gs://dataflow-templates-REGION_NAME/latest/flex/Dataplex_File_Format_Conversion_Preview \\\n--parameters \\\ninputAssetOrEntitiesList=INPUT_ASSET_OR_ENTITIES_LIST,\\\noutputFileFormat=OUTPUT_FILE_FORMAT,\\\noutputAsset=OUTPUT_ASSET\n```\n\nReplace the following: \n\n```\nJOB_NAME: a job name of your choice\nPROJECT_ID: your template project ID\nREGION_NAME: region in which to run the job\nINPUT_ASSET_OR_ENTITIES_LIST: path to your JDBC drivers\nOUTPUT_FILE_FORMAT: your output file format in Cloud Storage\nOUTPUT_ASSET: your Dataplex Universal Catalog output asset ID\n```\n\n### REST\n\nSubmit an HTTP POST request: \n\n```\nPOST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/REGION_NAME/flexTemplates:launch\n{\n \"launch_parameter\": {\n \"jobName\": \"JOB_NAME\",\n \"parameters\": {\n \"inputAssetOrEntitiesList\": \"INPUT_ASSET_OR_ENTITIES_LIST\",\n \"outputFileFormat\": \"OUTPUT_FILE_FORMAT\",\n \"outputAsset\": \"OUTPUT_ASSET\",\n },\n \"containerSpecGcsPath\": \"gs://dataflow-templates-REGION_NAME/latest/flex/Dataplex_File_Format_Conversion_Preview\",\n }\n}\n```\n\nReplace the following: \n\n```\nPROJECT_ID: your template project ID\nREGION_NAME: region in which to run the job\nJOB_NAME: a job name of your choice\nINPUT_ASSET_OR_ENTITIES_LIST: path to your JDBC drivers\nOUTPUT_FILE_FORMAT: your output file format in Cloud Storage\nOUTPUT_ASSET: your Dataplex Universal Catalog output asset ID\n```\n\nTemplate: Tier data from a BigQuery asset to a Cloud Storage asset\n------------------------------------------------------------------\n\nThe Dataplex Universal Catalog BigQuery to Cloud Storage\ntemplate copies data from a Dataplex Universal Catalog BigQuery\nasset to a Dataplex Universal Catalog Cloud Storage asset in a\nDataplex Universal Catalog-compatible layout and format. You can specify a\nBigQuery dataset or a list of BigQuery tables\nto be copied. For additional flexibility, the template allows for copying data\nolder than a specified modification date and allows for optionally deleting data\nfrom BigQuery after a successful copy.\n\nWhen copying partitioned tables from BigQuery to\nCloud Storage:\n\n- The template creates Hive-style partitions on the Cloud Storage bucket. BigQuery cannot have the Hive-style partition key be the same as an existing column. You can use the option `enforceSamePartitionKey` to either create a new partition key or keep the same partition key but rename the existing column.\n- Dataplex Universal Catalog Discovery registers the partition type as `string` when creating a BigQuery table (and a table in Dataproc Metastore). This may affect your existing partition filters.\n\nThere is a limit on the number of tables and partitions that can be transformed\nin a single template run, which is approximately 300. The exact number depends\non the length of the table names and other factors.\n\n### Template parameters\n\n### Run the template\n\n### Console\n\n1. In the Google Cloud console, go to the **Dataplex Universal Catalog** page.\n\n [Go to Dataplex Universal Catalog](https://console.cloud.google.com/dataplex/lakes)\n2. Navigate to the **Process** view.\n\n3. Click **Create Task**.\n\n4. Under **Tier from BQ to GCS Assets** , click **Create task**.\n\n5. Choose a Dataplex Universal Catalog lake.\n\n6. Provide a task name.\n\n7. Choose a region for task execution.\n\n8. Fill in the required parameters.\n\n9. Click **Continue**.\n\n### gcloud\n\nIn your shell or terminal, run the template: \n\n```\ngcloud beta dataflow flex-template run JOB_NAME \\\n--project=PROJECT_ID \\\n--region=REGION_NAME \\\n--template-file-gcs-location=gs://dataflow-templates-REGION_NAME/latest/flex/Dataplex_BigQuery_to_GCS_Preview \\\n--parameters \\\nsourceBigQueryDataset=SOURCE_ASSET_NAME_OR_DATASET_ID,\\\ndestinationStorageBucketAssetName=DESTINATION_ASSET_NAME\n```\n\nReplace the following: \n\n```\nJOB_NAME: a job name of your choice\nPROJECT_ID: your template project ID\nREGION_NAME: region in which to run the job\nSOURCE_ASSET_NAME_OR_DATASET_ID: your Dataplex Universal Catalog asset\nname for the source BigQuery dataset, or the dataset ID\nDESTINATION_ASSET_NAME: your Dataplex Universal Catalog asset name for\nthe destination Cloud Storage bucket\n```\n\n### REST\n\nSubmit an HTTP POST request: \n\n```\nPOST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/REGION_NAME/flexTemplates:launch\n{\n \"launch_parameter\": {\n \"jobName\": \"JOB_NAME\",\n \"parameters\": {\n \"sourceBigQueryDataset\": \"SOURCE_ASSET_NAME_OR_DATASET_ID\",\n \"destinationStorageBucketAssetName\": \"DESTINATION_ASSET_NAME\",\n },\n \"containerSpecGcsPath\": \"gs://dataflow-templates-REGION_NAME/latest/flex/Dataplex_BigQuery_to_GCS_Preview\",\n }\n}\n```\n\nReplace the following: \n\n```\nPROJECT_ID: your template project ID\nREGION_NAME: region in which to run the job\nJOB_NAME: a job name of your choice\nSOURCE_ASSET_NAME_OR_DATASET_ID: your Dataplex Universal Catalog asset\nname for the source BigQuery dataset, or the dataset ID\nDESTINATION_ASSET_NAME: your Dataplex Universal Catalog asset name for\nthe destination Cloud Storage bucket\nREGION_NAME: region in which to run the job\n```\n\nSchedule other Google Cloud-provided or custom Dataflow templates\n-----------------------------------------------------------------\n\nDataplex Universal Catalog lets you schedule and monitor any of the\nGoogle Cloud-provided Dataflow templates or your custom\nDataflow template in the console.\n\n### Schedule\n\n### Console\n\n1. In the Google Cloud console, go to the **Dataplex Universal Catalog** page.\n\n [Go to Dataplex Universal Catalog](https://console.cloud.google.com/dataplex/lakes)\n2. Navigate to the **Process** view.\n\n3. Click **Create Task**.\n\n4. Under **Author a Dataflow pipeline** , click **Create Dataflow pipeline**.\n\n5. Choose a Dataplex Universal Catalog lake.\n\n6. Provide a task name.\n\n7. Choose a region for where to run the task.\n\n8. Choose a Dataflow template.\n\n9. Fill in the required parameters.\n\n10. Click **Continue**.\n\n### Monitor\n\n### Console\n\n1. In the Google Cloud console, go to the **Dataplex Universal Catalog** page.\n\n [Go to Dataplex Universal Catalog](https://console.cloud.google.com/dataplex/lakes)\n2. Navigate to the **Process** view.\n\n3. Click **Dataflow pipelines**.\n\n4. Filter by lake or pipeline name."]]