Stay organized with collections
Save and categorize content based on your preferences.
OpenLineage is an open platform
for collecting and analyzing data lineage information. Using an open standard
for lineage data, OpenLineage captures lineage events from data pipeline
components which use an OpenLineage API to report on runs, jobs, and datasets.
Through the Data Lineage API, you can import OpenLineage events to display
in the Dataplex Universal Catalog web interface alongside lineage information from
Google Cloud services, such as BigQuery, Cloud Composer,
Cloud Data Fusion, and Dataproc.
The Data Lineage API supports OpenLineage major versions 1 and 2.
The Data Lineage API doesn't support the following:
Any subsequent OpenLineage release with message format changes
DatasetEvent
JobEvent
Maximum size of a single message is 5 MB.
Length of each Fully Qualified Name
in inputs and outputs is limited to 4000 characters.
Links
are grouped by events with 100 links. The maximum aggregate number of links
is 1000.
Dataplex Universal Catalog displays a lineage graph for each job run, showing the inputs
and outputs of lineage events. It doesn't support lower-level processes like
Spark stages.
OpenLineage mapping
The REST API method ProcessOpenLineageRunEvent
maps OpenLineage attributes to Data Lineage API attributes as follows:
projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID/lineageEvents/HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT (for example, projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
To import an OpenLineage event into Dataplex Universal Catalog, call the REST API method
ProcessOpenLineageRunEvent:
POST https://datalineage.googleapis.com/v1/projects/{project}/locations/{location}:processOpenLineageRunEvent \
--data '{"eventTime":"2023-04-04T13:21:16.098Z","eventType":"COMPLETE","inputs":[{"name":"somename","namespace":"somenamespace"}],"job":{"name":"somename","namespace":"somenamespace"},"outputs":[{"name":"somename","namespace":"somenamespace"}],"producer":"someproducer","run":{"runId":"somerunid"},"schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent"}'
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eOpenLineage, an open platform for data lineage, can be integrated with the Dataplex Data Lineage API to display lineage information from various data pipeline components.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataplex Data Lineage API imports OpenLineage events via the \u003ccode\u003eProcessOpenLineageRunEvent\u003c/code\u003e REST API method, mapping OpenLineage facets and attributes to the Data Lineage API structure.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Lineage API supports OpenLineage major versions 1 and 2, but it has limitations, such as not supporting \u003ccode\u003eDatasetEvent\u003c/code\u003e or \u003ccode\u003eJobEvent\u003c/code\u003e and having size and length restrictions for messages and fully qualified names.\u003c/p\u003e\n"],["\u003cp\u003eDataplex displays lineage graphs for job runs, showcasing inputs and outputs derived from lineage events, however it does not go down to the lower level process such as Spark stages.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Lineage API stores a curated set of facet fields and core information from OpenLineage messages, such as \u003ccode\u003espark_version\u003c/code\u003e, \u003ccode\u003eenvironment-properties\u003c/code\u003e, \u003ccode\u003eeventTime\u003c/code\u003e, \u003ccode\u003erun.runId\u003c/code\u003e, \u003ccode\u003ejob.namespace\u003c/code\u003e, and \u003ccode\u003ejob.name\u003c/code\u003e.\u003c/p\u003e\n"]]],[],null,["# Integrate with OpenLineage\n\n[OpenLineage](https://openlineage.io/) is an open platform\nfor collecting and analyzing data lineage information. Using an open standard\nfor lineage data, OpenLineage captures lineage events from data pipeline\ncomponents which use an OpenLineage API to report on runs, jobs, and datasets.\n\nThrough the Data Lineage API, you can import OpenLineage events to display\nin the Dataplex Universal Catalog web interface alongside lineage information from\nGoogle Cloud services, such as BigQuery, Cloud Composer,\nCloud Data Fusion, and Dataproc.\n\nTo import OpenLineage events that use the\n[OpenLineage specification](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json),\nuse the [`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent)\nREST API method, and map OpenLineage facets to Data Lineage API attributes.\n\nLimitations\n-----------\n\n- The Data Lineage API supports OpenLineage major versions 1 and 2.\n\n- The Data Lineage API doesn't support the following:\n\n - Any subsequent OpenLineage release with message format changes\n - `DatasetEvent`\n - `JobEvent`\n- Maximum size of a single message is 5 MB.\n\n- Length of each [Fully Qualified Name](/dataplex/docs/fully-qualified-names)\n in inputs and outputs is limited to 4000 characters.\n\n- [Links](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations.processes.runs.lineageEvents#EventLink)\n are grouped by events with 100 links. The maximum aggregate number of links\n is 1000.\n\n- Dataplex Universal Catalog displays a lineage graph for each job run, showing the inputs\n and outputs of lineage events. It doesn't support lower-level processes like\n Spark stages.\n\nOpenLineage mapping\n-------------------\n\nThe REST API method [`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent)\nmaps OpenLineage attributes to Data Lineage API attributes as follows:\n\nImport an OpenLineage event\n---------------------------\n\nIf you haven't yet set up OpenLineage, see\n[Getting started](https://openlineage.io/getting-started/).\n\nTo import an OpenLineage event into Dataplex Universal Catalog, call the REST API method\n[`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent): \n\n POST https://datalineage.googleapis.com/v1/projects/{project}/locations/{location}:processOpenLineageRunEvent \\\n --data '{\"eventTime\":\"2023-04-04T13:21:16.098Z\",\"eventType\":\"COMPLETE\",\"inputs\":[{\"name\":\"somename\",\"namespace\":\"somenamespace\"}],\"job\":{\"name\":\"somename\",\"namespace\":\"somenamespace\"},\"outputs\":[{\"name\":\"somename\",\"namespace\":\"somenamespace\"}],\"producer\":\"someproducer\",\"run\":{\"runId\":\"somerunid\"},\"schemaURL\":\"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent\"}'\n\nAnalyze information from OpenLineage\n------------------------------------\n\nTo analyze the imported OpenLineage events, see\n[View lineage graphs in Dataplex Universal Catalog UI](/dataplex/docs/use-lineage#view-lineage).\n\nStored data\n-----------\n\nThe Data Lineage API doesn't store all facets data from the OpenLineage messages.\nThe Data Lineage API stores the following facet fields:\n\n- `spark_version`\n - `openlineage-spark-version`\n - `spark-version`\n- all `spark.logicalPlan.*`\n- `environment-properties` (custom Google Cloud lineage facet)\n - `origin.sourcetype` and `origin.name`\n - `spark.app.id`\n - `spark.app.name`\n - `spark.batch.id`\n - `spark.batch.uuid`\n - `spark.cluster.name`\n - `spark.cluster.region`\n - `spark.job.id`\n - `spark.job.uuid`\n - `spark.project.id`\n - `spark.query.node.name`\n - `spark.session.id`\n - `spark.session.uuid`\n\nThe Data Lineage API stores the following information:\n\n- `eventTime`\n- `run.runId`\n- `job.namespace`\n- `job.name`"]]