Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[[["\u003cp\u003eData lineage in Dataflow tracks how data moves through your systems, including its origin, transformations, and destination, allowing for end-to-end data asset movement tracking.\u003c/p\u003e\n"],["\u003cp\u003eEnabling data lineage for Dataflow jobs captures lineage events and publishes them to the Dataplex Data Lineage API, and it is done on a per-project basis and at the job level using the \u003ccode\u003eenable_lineage\u003c/code\u003e service option.\u003c/p\u003e\n"],["\u003cp\u003eViewing lineage information in Dataplex can be done through a visualization graph or a single table in the Google Cloud console, as well as retrieving JSON data from the Data Lineage API.\u003c/p\u003e\n"],["\u003cp\u003eSupported sources and sinks for data lineage in Dataflow include Apache Kafka, BigQuery, Bigtable, Cloud Storage, JDBC, Pub/Sub, and Spanner, and the feature requires Apache Beam SDK versions 2.63.0 or later.\u003c/p\u003e\n"],["\u003cp\u003eDisabling data lineage requires cancelling the current job and running a new version without the \u003ccode\u003eenable_lineage\u003c/code\u003e service option.\u003c/p\u003e\n"]]],[],null,["# Use data lineage in Dataflow\n\nData lineage is a Dataflow feature that lets you track\nhow data moves through your systems: where it comes from, where it is passed to,\nand what transformations are applied to it.\n\nEach pipeline that you run by using Dataflow has several associated\ndata assets. The lineage of a data asset includes its origin, what happens to\nit, and where it moves over time. With data lineage, you can track\nthe end-to-end movement of your data assets, from origin to eventual destination.\n\nWhen you enable data lineage for your\nDataflow jobs, Dataflow\ncaptures lineage events and publishes them to the Dataplex Universal Catalog\n[Data Lineage API](/dataplex/docs/reference/data-lineage/rest).\n\nTo access lineage information through Dataplex Universal Catalog, see\n[Use data lineage with Google Cloud systems](/dataplex/docs/use-lineage).\n\nBefore you begin\n----------------\n\nSet up your project:\n\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, BigQuery, and Data lineage APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,bigquery.googleapis.com,datalineage.googleapis.com)\n\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, BigQuery, and Data lineage APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,bigquery.googleapis.com,datalineage.googleapis.com)\n\n\u003cbr /\u003e\n\n| **Caution:** Data lineage is enabled on a per-project basis, not a per-service basis. After you enable the Data Lineage API, lineage information is automatically reported for multiple Google Cloud services in the project, depending on their product-level lineage control. For more details, see [Data lineage considerations](/dataplex/docs/lineage-considerations).\n\nIn Dataflow, you also need to enable lineage at the job level.\nSee [Enable data lineage in Dataflow](#enable-data-lineage) in\nthis document.\n\n### Required roles\n\n\nTo get the permissions that\nyou need to view lineage visualization graphs,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- [Dataplex Catalog viewer](/iam/docs/roles-permissions/dataplex#dataplex.catalogViewer) (`roles/dataplex.catalogViewer`) on the Dataplex Universal Catalog resource project\n- [Data Lineage Viewer](/iam/docs/roles-permissions/datalineage#datalineage.viewer) (`roles/datalineage.viewer`) on the project where you use Dataflow\n- [Dataflow viewer](/iam/docs/roles-permissions/dataflow#dataflow.viewer) (`roles/dataflow.viewer`) on the project where you use Dataflow\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nFor more information about data lineage roles, see\n[Predefined roles for data lineage](/dataplex/docs/iam-roles#lineage-roles).\n\nSupport and limitations\n-----------------------\n\nData lineage in Dataflow has the following limitations:\n\n- Data lineage is supported in the Apache Beam SDK versions 2.63.0 and later.\n- You must enable data lineage on a per-job basis.\n- Data capture isn't instantaneous. It can take a few minutes for Dataflow job lineage data to appear in Dataplex Universal Catalog.\n- The following sources and sinks are supported:\n\n - Apache Kafka\n - BigQuery\n - Bigtable\n - Cloud Storage\n - JDBC (Java Database Connectivity)\n - Pub/Sub\n - Spanner\n\n [Dataflow templates](/dataflow/docs/guides/templates/provided-templates)\n that use these sources and sinks also automatically capture and publish\n lineage events.\n\nEnable data lineage in Dataflow\n-------------------------------\n\nYou need to enable lineage at the job level. To enable data lineage,\nuse the `enable_lineage`\n[Dataflow service option](/dataflow/docs/reference/service-options)\nas follows: \n\n### Java\n\n --dataflowServiceOptions=enable_lineage=true\n\n### Python\n\n --dataflow_service_options=enable_lineage=true\n\n### Go\n\n --dataflow_service_options=enable_lineage=true\n\n### gcloud\n\nUse the\n[`gcloud dataflow jobs run`](/sdk/gcloud/reference/dataflow/jobs/run) command\nwith the `additional-experiments` option. If you're using Flex Templates, use\nthe\n[`gcloud dataflow flex-template run`](/sdk/gcloud/reference/dataflow/flex-template/run)\ncommand. \n\n --additional-experiments=enable_lineage=true\n\nOptionally, you can specify one or both of the following parameters with the\nservice option:\n\n- `process_id`: A unique identifier that Dataplex Universal Catalog uses to group job runs. If not specified, the job name is used.\n- `process_name`: A human-readable name for the data lineage process. If not specified, the job name prefixed with `\"Dataflow \"` is used.\n\nSpecify these options as follows: \n\n### Java\n\n --dataflowServiceOptions=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\n### Python\n\n --dataflow_service_options=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\n### Go\n\n --dataflow_service_options=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\n### gcloud\n\n --additional-experiments=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\nView lineage in Dataplex Universal Catalog\n------------------------------------------\n\nData lineage provides information about the relations between your project\nresources and the processes that created them. You can view data lineage\ninformation in the Google Cloud console in the form of a graph or a\nsingle table. You can also retrieve data lineage information from the\nData Lineage API in the form of JSON data.\n\nFor more information, see\n[Use data lineage with Google Cloud systems](/dataplex/docs/use-lineage).\n\nDisable data lineage in Dataflow\n--------------------------------\n\nIf data lineage is enabled for a specific job and you want to disable\nit, cancel the existing job and run a new version of the job without the\n`enable_lineage` service option.\n\nBilling\n-------\n\nUsing data lineage in Dataflow doesn't impact your\nDataflow bill, but it might incur additional charges on your\nDataplex Universal Catalog bill. For more information, see\n[Data lineage considerations](/dataplex/docs/lineage-considerations)\nand [Dataplex Universal Catalog pricing](/dataplex/pricing).\n\nWhat's next\n-----------\n\n- Learn more about [data lineage](/dataplex/docs/about-data-lineage).\n- Learn how to [use\n data lineage](/dataplex/docs/use-lineage)."]]