Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
O OpenLineage é uma plataforma aberta
para coletar e analisar informações de linhagem de dados. Usando um padrão aberto para dados de linhagem, o OpenLineage captura eventos de linhagem de componentes do pipeline de dados que usam uma API OpenLineage para informar sobre execuções, jobs e conjuntos de dados.
Com a API Data Lineage, é possível importar eventos do OpenLineage para exibição
na interface da Web do Dataplex Universal Catalog com informações de linhagem de
Google Cloud serviços, como BigQuery, Cloud Composer,
Cloud Data Fusion e Dataproc.
Os links são agrupados por eventos com 100 links. O número máximo agregado de links é 1.000.
O Dataplex Universal Catalog mostra um gráfico de linhagem para cada execução de job, mostrando as entradas
e saídas de eventos de linhagem. Ele não oferece suporte a processos de nível inferior, como
etapas do Spark.
Mapeamento do OpenLineage
O método da API REST ProcessOpenLineageRunEvent
mapeia os atributos do OpenLineage para os atributos da API de linhagem de dados da seguinte maneira:
projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID/lineageEvents/HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT (por exemplo, projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
Se você ainda não configurou o OpenLineage, consulte
Primeiros passos.
Para importar um evento do OpenLineage para o catálogo universal do Dataplex, chame o método ProcessOpenLineageRunEvent da API REST:
POST https://datalineage.googleapis.com/v1/projects/{project}/locations/{location}:processOpenLineageRunEvent \
--data '{"eventTime":"2023-04-04T13:21:16.098Z","eventType":"COMPLETE","inputs":[{"name":"somename","namespace":"somenamespace"}],"job":{"name":"somename","namespace":"somenamespace"},"outputs":[{"name":"somename","namespace":"somenamespace"}],"producer":"someproducer","run":{"runId":"somerunid"},"schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent"}'
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-08-19 UTC."],[[["\u003cp\u003eOpenLineage, an open platform for data lineage, can be integrated with the Dataplex Data Lineage API to display lineage information from various data pipeline components.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataplex Data Lineage API imports OpenLineage events via the \u003ccode\u003eProcessOpenLineageRunEvent\u003c/code\u003e REST API method, mapping OpenLineage facets and attributes to the Data Lineage API structure.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Lineage API supports OpenLineage major versions 1 and 2, but it has limitations, such as not supporting \u003ccode\u003eDatasetEvent\u003c/code\u003e or \u003ccode\u003eJobEvent\u003c/code\u003e and having size and length restrictions for messages and fully qualified names.\u003c/p\u003e\n"],["\u003cp\u003eDataplex displays lineage graphs for job runs, showcasing inputs and outputs derived from lineage events, however it does not go down to the lower level process such as Spark stages.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Lineage API stores a curated set of facet fields and core information from OpenLineage messages, such as \u003ccode\u003espark_version\u003c/code\u003e, \u003ccode\u003eenvironment-properties\u003c/code\u003e, \u003ccode\u003eeventTime\u003c/code\u003e, \u003ccode\u003erun.runId\u003c/code\u003e, \u003ccode\u003ejob.namespace\u003c/code\u003e, and \u003ccode\u003ejob.name\u003c/code\u003e.\u003c/p\u003e\n"]]],[],null,["# Integrate with OpenLineage\n\n[OpenLineage](https://openlineage.io/) is an open platform\nfor collecting and analyzing data lineage information. Using an open standard\nfor lineage data, OpenLineage captures lineage events from data pipeline\ncomponents which use an OpenLineage API to report on runs, jobs, and datasets.\n\nThrough the Data Lineage API, you can import OpenLineage events to display\nin the Dataplex Universal Catalog web interface alongside lineage information from\nGoogle Cloud services, such as BigQuery, Cloud Composer,\nCloud Data Fusion, and Dataproc.\n\nTo import OpenLineage events that use the\n[OpenLineage specification](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json),\nuse the [`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent)\nREST API method, and map OpenLineage facets to Data Lineage API attributes.\n\nLimitations\n-----------\n\n- The Data Lineage API supports OpenLineage major versions 1 and 2.\n\n- The Data Lineage API doesn't support the following:\n\n - Any subsequent OpenLineage release with message format changes\n - `DatasetEvent`\n - `JobEvent`\n- Maximum size of a single message is 5 MB.\n\n- Length of each [Fully Qualified Name](/dataplex/docs/fully-qualified-names)\n in inputs and outputs is limited to 4000 characters.\n\n- [Links](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations.processes.runs.lineageEvents#EventLink)\n are grouped by events with 100 links. The maximum aggregate number of links\n is 1000.\n\n- Dataplex Universal Catalog displays a lineage graph for each job run, showing the inputs\n and outputs of lineage events. It doesn't support lower-level processes like\n Spark stages.\n\nOpenLineage mapping\n-------------------\n\nThe REST API method [`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent)\nmaps OpenLineage attributes to Data Lineage API attributes as follows:\n\nImport an OpenLineage event\n---------------------------\n\nIf you haven't yet set up OpenLineage, see\n[Getting started](https://openlineage.io/getting-started/).\n\nTo import an OpenLineage event into Dataplex Universal Catalog, call the REST API method\n[`ProcessOpenLineageRunEvent`](/dataplex/docs/reference/data-lineage/rest/v1/projects.locations/processOpenLineageRunEvent): \n\n POST https://datalineage.googleapis.com/v1/projects/{project}/locations/{location}:processOpenLineageRunEvent \\\n --data '{\"eventTime\":\"2023-04-04T13:21:16.098Z\",\"eventType\":\"COMPLETE\",\"inputs\":[{\"name\":\"somename\",\"namespace\":\"somenamespace\"}],\"job\":{\"name\":\"somename\",\"namespace\":\"somenamespace\"},\"outputs\":[{\"name\":\"somename\",\"namespace\":\"somenamespace\"}],\"producer\":\"someproducer\",\"run\":{\"runId\":\"somerunid\"},\"schemaURL\":\"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent\"}'\n\nAnalyze information from OpenLineage\n------------------------------------\n\nTo analyze the imported OpenLineage events, see\n[View lineage graphs in Dataplex Universal Catalog UI](/dataplex/docs/use-lineage#view-lineage).\n\nStored data\n-----------\n\nThe Data Lineage API doesn't store all facets data from the OpenLineage messages.\nThe Data Lineage API stores the following facet fields:\n\n- `spark_version`\n - `openlineage-spark-version`\n - `spark-version`\n- all `spark.logicalPlan.*`\n- `environment-properties` (custom Google Cloud lineage facet)\n - `origin.sourcetype` and `origin.name`\n - `spark.app.id`\n - `spark.app.name`\n - `spark.batch.id`\n - `spark.batch.uuid`\n - `spark.cluster.name`\n - `spark.cluster.region`\n - `spark.job.id`\n - `spark.job.uuid`\n - `spark.project.id`\n - `spark.query.node.name`\n - `spark.session.id`\n - `spark.session.uuid`\n\nThe Data Lineage API stores the following information:\n\n- `eventTime`\n- `run.runId`\n- `job.namespace`\n- `job.name`"]]