Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
OpenLineage adalah platform terbuka
untuk mengumpulkan dan menganalisis informasi silsilah data. Dengan menggunakan standar terbuka
untuk data lineage, OpenLineage merekam peristiwa lineage dari komponen data pipeline
yang menggunakan OpenLineage API untuk melaporkan operasi, tugas, dan set data.
Melalui Data Lineage API, Anda dapat mengimpor peristiwa OpenLineage untuk ditampilkan di antarmuka web Katalog Universal Dataplex bersama informasi lineage dari layananGoogle Cloud , seperti BigQuery, Cloud Composer, Cloud Data Fusion, dan Dataproc.
Link
dikelompokkan berdasarkan peristiwa dengan 100 link. Jumlah agregat maksimum link
adalah 1.000.
Katalog Universal Dataplex menampilkan grafik silsilah untuk setiap tugas yang dijalankan, yang menampilkan input
dan output peristiwa silsilah. Spark tidak mendukung proses tingkat rendah seperti
tahap Spark.
Pemetaan OpenLineage
Metode REST API ProcessOpenLineageRunEvent
memetakan atribut OpenLineage ke atribut Data Lineage API sebagai berikut:
Jika Anda belum menyiapkan OpenLineage, lihat
Memulai.
Untuk mengimpor peristiwa OpenLineage ke Dataplex Universal Catalog, panggil metode REST API
ProcessOpenLineageRunEvent:
POST https://datalineage.googleapis.com/v1/projects/{project}/locations/{location}:processOpenLineageRunEvent \
--data '{"eventTime":"2023-04-04T13:21:16.098Z","eventType":"COMPLETE","inputs":[{"name":"somename","namespace":"somenamespace"}],"job":{"name":"somename","namespace":"somenamespace"},"outputs":[{"name":"somename","namespace":"somenamespace"}],"producer":"someproducer","run":{"runId":"somerunid"},"schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent"}'
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-19 UTC."],[[["\u003cp\u003eOpenLineage, an open platform for data lineage, can be integrated with the Dataplex Data Lineage API to display lineage information from various data pipeline components.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataplex Data Lineage API imports OpenLineage events via the \u003ccode\u003eProcessOpenLineageRunEvent\u003c/code\u003e REST API method, mapping OpenLineage facets and attributes to the Data Lineage API structure.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Lineage API supports OpenLineage major versions 1 and 2, but it has limitations, such as not supporting \u003ccode\u003eDatasetEvent\u003c/code\u003e or \u003ccode\u003eJobEvent\u003c/code\u003e and having size and length restrictions for messages and fully qualified names.\u003c/p\u003e\n"],["\u003cp\u003eDataplex displays lineage graphs for job runs, showcasing inputs and outputs derived from lineage events, however it does not go down to the lower level process such as Spark stages.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Lineage API stores a curated set of facet fields and core information from OpenLineage messages, such as \u003ccode\u003espark_version\u003c/code\u003e, \u003ccode\u003eenvironment-properties\u003c/code\u003e, \u003ccode\u003eeventTime\u003c/code\u003e, \u003ccode\u003erun.runId\u003c/code\u003e, \u003ccode\u003ejob.namespace\u003c/code\u003e, and \u003ccode\u003ejob.name\u003c/code\u003e.\u003c/p\u003e\n"]]],[],null,["# Integrate with OpenLineage\n\n[OpenLineage](https://openlineage.io/) is an open platform\nfor collecting and analyzing data lineage information. Using an open standard\nfor lineage data, OpenLineage captures lineage events from data pipeline\ncomponents which use an OpenLineage API to report on runs, jobs, and datasets.\n\nThrough the Data Lineage API, you can import OpenLineage events to display\nin the Dataplex Universal Catalog web interface alongside lineage information from\nGoogle Cloud services, such as BigQuery, Cloud Composer,\nCloud Data Fusion, and Dataproc.\n\nTo import OpenLineage events that use the\n[OpenLineage specification](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json),\nuse the [`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent)\nREST API method, and map OpenLineage facets to Data Lineage API attributes.\n\nLimitations\n-----------\n\n- The Data Lineage API supports OpenLineage major versions 1 and 2.\n\n- The Data Lineage API doesn't support the following:\n\n - Any subsequent OpenLineage release with message format changes\n - `DatasetEvent`\n - `JobEvent`\n- Maximum size of a single message is 5 MB.\n\n- Length of each [Fully Qualified Name](/dataplex/docs/fully-qualified-names)\n in inputs and outputs is limited to 4000 characters.\n\n- [Links](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations.processes.runs.lineageEvents#EventLink)\n are grouped by events with 100 links. The maximum aggregate number of links\n is 1000.\n\n- Dataplex Universal Catalog displays a lineage graph for each job run, showing the inputs\n and outputs of lineage events. It doesn't support lower-level processes like\n Spark stages.\n\nOpenLineage mapping\n-------------------\n\nThe REST API method [`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent)\nmaps OpenLineage attributes to Data Lineage API attributes as follows:\n\nImport an OpenLineage event\n---------------------------\n\nIf you haven't yet set up OpenLineage, see\n[Getting started](https://openlineage.io/getting-started/).\n\nTo import an OpenLineage event into Dataplex Universal Catalog, call the REST API method\n[`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent): \n\n POST https://datalineage.googleapis.com/v1/projects/{project}/locations/{location}:processOpenLineageRunEvent \\\n --data '{\"eventTime\":\"2023-04-04T13:21:16.098Z\",\"eventType\":\"COMPLETE\",\"inputs\":[{\"name\":\"somename\",\"namespace\":\"somenamespace\"}],\"job\":{\"name\":\"somename\",\"namespace\":\"somenamespace\"},\"outputs\":[{\"name\":\"somename\",\"namespace\":\"somenamespace\"}],\"producer\":\"someproducer\",\"run\":{\"runId\":\"somerunid\"},\"schemaURL\":\"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent\"}'\n\nAnalyze information from OpenLineage\n------------------------------------\n\nTo analyze the imported OpenLineage events, see\n[View lineage graphs in Dataplex Universal Catalog UI](/dataplex/docs/use-lineage#view-lineage).\n\nStored data\n-----------\n\nThe Data Lineage API doesn't store all facets data from the OpenLineage messages.\nThe Data Lineage API stores the following facet fields:\n\n- `spark_version`\n - `openlineage-spark-version`\n - `spark-version`\n- all `spark.logicalPlan.*`\n- `environment-properties` (custom Google Cloud lineage facet)\n - `origin.sourcetype` and `origin.name`\n - `spark.app.id`\n - `spark.app.name`\n - `spark.batch.id`\n - `spark.batch.uuid`\n - `spark.cluster.name`\n - `spark.cluster.region`\n - `spark.job.id`\n - `spark.job.uuid`\n - `spark.project.id`\n - `spark.query.node.name`\n - `spark.session.id`\n - `spark.session.uuid`\n\nThe Data Lineage API stores the following information:\n\n- `eventTime`\n- `run.runId`\n- `job.namespace`\n- `job.name`"]]