Use data lineage with Google Cloud systems

Data lineage displays the relations between your project's resources and the processes that created them.

You can view data lineage information in the Google Cloud console in the following forms:

  • Lineage graph: shows the lineage that is upstream or downstream of a single root entry. For more information, see Lineage graph.
  • Lineage path visualization (Preview): shows the lineage links between two selected resources. For more information, see Lineage path visualization.
  • Lineage list view (Preview): shows detailed lineage information for resources in a single table that includes lineage information for resources with many connections. For more information, see Lineage list view.

You can also retrieve lineage information from the Data Lineage API in the form of JSON data.

Lineage is captured across projects. When you view lineage that is generated from multiple projects, you can view the aggregated lineage information in any of the relevant projects.

Roles and permissions

To view lineage information, ask your administrator to grant you viewer roles as described in Predefined data lineage roles. You must have access in both the project where you view lineage, and the projects in which lineage is recorded.

Data Catalog tracks lineage information automatically when you enable the Data Lineage API. You don't need any administrator or editor roles to capture lineage for your data assets.

For more information about granting roles, see Manage access. You can assign a role at a higher folder or organization level (see Grant or revoke a single role).

Enable data lineage

Enable data lineage to begin automatically tracking lineage information for supported systems. You must enable the Data Lineage API in both the project where you view lineage, and the projects in which lineage is recorded. For more information, see Project types.

  1. To capture lineage information, do the following:

    1. In the Google Cloud console, on the Project selector page, select the project in which you want to record lineage.

      Go to Project selector

    2. Enable the Data Lineage API.

      Enable the Data Lineage API

    3. Repeat the previous steps for each project in which you want to record lineage.
  2. In the project where you view lineage, enable the Data Lineage API and the Data Catalog API.

    Enable the APIs

View lineage in Dataplex

You can view data lineage information in the Dataplex web interface.

To view the lineage, follow these instructions:

  1. Open the Dataplex search page and find the asset for which you want to view lineage information.

    Open the Dataplex search page

    For more information see How to search for data assets.

  2. On the entry details page, select the Lineage tab.

  3. To view the lineage graph, click Graph.

    • Select the process or data source buttons to display the details panel.

    • To view upstream or downstream lineage information for a resource, click Expand.

  4. To view lineage in list view, click List.

  5. To view the lineage path visualization, click List, and then select the target resource in the table of results. In the details panel, click Target, and then click Visualize lineage.

View lineage in BigQuery

You can view data lineage information in the BigQuery web interface.

To view the lineage, follow these instructions:

  1. In the Google Cloud console, go to the BigQuery page.

    Open the BigQuery page

  2. Open the table for which you want to see the data lineage.

  3. Click the Lineage tab.

  4. To view the lineage graph, click Graph.

    • Select the process or data source buttons to display the details panel.

    • To view upstream or downstream lineage information for a resource, click Expand.

  5. To view lineage in list view, click List.

  6. To view the lineage path visualization, click List, and then select the target resource in the table of results. In the details panel, click Target, and then click Visualize lineage.

View lineage in Vertex AI

Systems like Vertex AI Pipelines generate lineage data for Vertex AI models and datasets. You can view data lineage information in the Vertex AI web interface.

View lineage for a managed dataset in Vertex AI

To view the lineage for a dataset, follow these instructions:

  1. In the Google Cloud console, go to the Datasets page.

    Open the Datasets page

  2. Click the dataset for which you want to see the data lineage.

  3. Click the Lineage tab.

  4. To view the lineage graph, click Graph.

    • Select the process or data source buttons to display the details panel.

    • To view upstream or downstream lineage information for a resource, click Expand.

  5. To view lineage in list view, click List.

  6. To view the lineage path visualization, click List, and then select the target resource in the table of results. In the details panel, click Target, and then click Visualize lineage.

View lineage for a model in Vertex AI

To view the lineage for a model, follow these instructions:

  1. In the Google Cloud console, go to the Model Registry page.

    Open the Model Registry page

  2. Click the model for which you want to see the data lineage.

  3. Click the Lineage tab.

  4. To view the lineage graph, click Graph.

    • Select the process or data source buttons to display the details panel.

    • To view upstream or downstream lineage information for a resource, click Expand.

  5. To view lineage in list view, click List.

  6. To view the lineage path visualization, click List, and then select the target resource in the table of results. In the details panel, click Target, and then click Visualize lineage.

What's next