Export metadata

You can export metadata out of Dataplex for use in external systems by running a metadata export job.

You might want to export metadata in the following scenarios:

  • Query and analyze metadata with BigQuery or other data analytics tools
  • Programmatically process large volumes of metadata, which you can later import back into Dataplex
  • Integrate metadata into custom applications or third-party tools

A metadata export job exports a snapshot of your universal catalog metadata. Universal catalog metadata consists of entries and their aspects. The steps on this page assume that you're familiar with universal catalog concepts, including entry groups, entry types, and aspect types.

Job scope

The job scope defines the metadata to export. You must provide one of the following job scopes for each metadata export job:

  • Organization: exports the metadata that belongs to your organization.
  • Projects: exports the metadata that belongs to the specified projects.
  • Entry groups: exports the metadata that belongs to the specified entry groups.

You can further restrict the scope by specifying the entry types or aspect types to include in the job. The job exports only the entries and aspects that belong to these entry types and aspect types.

VPC Service Controls

Dataplex uses VPC Service Controls to provide additional security for metadata export jobs. The project that the job belongs to determines the VPC Service Controls perimeter, as follows:

  • If you set the job scope to the organization level, the following things happen:
    • The export scope is the organization that the job belongs to.
    • Only the entries that are within the VPC Service Controls perimeter are exported.
    • Any projects that are within the job's organization but outside the VPC Service Controls perimeter are excluded.
  • If you set the job scope to projects or entry groups, the projects or entry groups must be in the same VPC Service Controls perimeter as the job. If any of the projects or entry groups violate VPC Service Controls rules, the job fails.

Before you begin

Before you export metadata, complete the tasks in this section.

Required roles for end users

To get the permissions that you need to manage metadata export jobs, ask your administrator to grant you the following IAM roles:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to manage metadata export jobs. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to manage metadata export jobs:

  • Export metadata:
    • dataplex.metadataJobs.create
    • dataplex.entryGroups.export
    • dataplex.entryGroups.get
    • resourcemanager.projects.get
    • resourcemanager.projects.list
  • Access the exported results: storage.objects.get

You might also be able to get these permissions with custom roles or other predefined roles.

Required roles for the Dataplex service account

To ensure that the Dataplex service account has the necessary permissions to access the Cloud Storage bucket, ask your administrator to grant the Dataplex service account the following permissions on the bucket: storage.buckets.get, storage.objects.get, and storage.objects.create.

Configure Google Cloud resources

  1. After installing the Google Cloud CLI, initialize it by running the following command:

    gcloud init

    If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  2. Create a Cloud Storage bucket to store the exported results.

    The bucket must be in the same location and the same VPC Service Controls perimeter as the metadata job.

Run a metadata export job

The following sections show how to export metadata with different job scopes.

Export metadata from your organization

To export the metadata from your organization, use the metadataJobs.create method and set the organizationLevel boolean to true.

Before using any of the request data, make the following replacements:

  • JOB_PROJECT: the Google Cloud project that you run the metadata job in. Provide a project number or project ID.
  • LOCATION_ID: the Google Cloud location, such as us-central1.
  • METADATA_JOB_ID: optional. The metadata job ID.
  • BUCKET: the Cloud Storage bucket to export the metadata to.

    Optionally, you can include a custom prefix after the bucket name, in the format gs://BUCKET/PREFIX/. The maximum length of the custom prefix is 128 characters.

HTTP method and URL:

POST https://dataplex.googleapis.com/v1/projects/JOB_PROJECT/locations/LOCATION_ID/metadataJobs?metadataJobId=METADATA_JOB_ID

Request JSON body:

{
  "type": EXPORT,
  "export_spec": {
    "output_path": "gs://BUCKET/",
    "scope": {
      "organizationLevel": true,
    },
  }
}

To send your request, expand one of these options:

The response identifies a long-running operation. The exported metadata is saved to a Cloud Storage bucket.

Export metadata from specific projects

To export metadata from one or more projects, use the metadataJobs.create method and provide a list of projects.

Before using any of the request data, make the following replacements:

  • JOB_PROJECT: the Google Cloud project that you run the metadata job in. Provide a project number or project ID.
  • LOCATION_ID: the Google Cloud location, such as us-central1.
  • METADATA_JOB_ID: optional. The metadata job ID.
  • BUCKET: the Cloud Storage bucket to export the metadata to.

    Optionally, you can include a custom prefix after the bucket name, in the format gs://BUCKET/PREFIX/. The maximum length of the custom prefix is 128 characters.

  • METADATA_SOURCE_PROJECT: a project whose metadata you want to export. Provide a project number or project ID. The project must be in the same organization and VPC Service Controls perimeter as the metadata job.

HTTP method and URL:

POST https://dataplex.googleapis.com/v1/projects/JOB_PROJECT/locations/LOCATION_ID/metadataJobs?metadataJobId=METADATA_JOB_ID

Request JSON body:

{
  "type": EXPORT,
  "export_spec": {
    "output_path": "gs://BUCKET/",
    "scope": {
      "projects": [
        "projects/METADATA_SOURCE_PROJECT",
        # Additional projects
      ],
    },
  }
}

To send your request, expand one of these options:

The response identifies a long-running operation. The exported metadata is saved to a Cloud Storage bucket.

Export metadata from specific entry groups

To export metadata from specific entry groups, use the metadataJobs.create method and provide a list of entry groups.

Before using any of the request data, make the following replacements:

  • JOB_PROJECT: the Google Cloud project that you run the metadata job in. Provide a project number or project ID.
  • LOCATION_ID: the Google Cloud location, such as us-central1.
  • METADATA_JOB_ID: optional. The metadata job ID.
  • BUCKET: the Cloud Storage bucket to export the metadata to.

    Optionally, you can include a custom prefix after the bucket name, in the format gs://BUCKET/PREFIX/. The maximum length of the custom prefix is 128 characters.

  • ENTRY_GROUP: the relative resource name of an entry group that is in scope for the job, in the format projects/PROJECT_ID_OR_NUMBER/locations/LOCATION_ID/entryGroups/ENTRY_GROUP_ID. The entry group must be in the same project as the metadata job.

HTTP method and URL:

POST https://dataplex.googleapis.com/v1/projects/JOB_PROJECT/locations/LOCATION_ID/metadataJobs?metadataJobId=METADATA_JOB_ID

Request JSON body:

{
  "type": EXPORT,
  "export_spec": {
    "output_path": "gs://BUCKET/",
    "scope": {
      "entryGroups": [
        "ENTRY_GROUP",
        # Additional entry groups
      ],
    },
  }
}

To send your request, expand one of these options:

The response identifies a long-running operation. The exported metadata is saved to a Cloud Storage bucket.

Export metadata from specific entry types or aspect types

To export metadata from specific entry types or aspect types, set the primary job scope, such as at the organization level, as shown in the following example. Then, provide a list of entry types, aspect types, or both.

Before using any of the request data, make the following replacements:

  • ENTRY_TYPE: optional. The relative resource name of an entry type that is in scope for the job, in the format projects/PROJECT_ID_OR_NUMBER/locations/LOCATION_ID/entryTypes/ENTRY_TYPE_ID.

  • ASPECT_TYPE: optional. The relative resource name of an aspect type that is in scope for the job, in the format projects/PROJECT_ID_OR_NUMBER/locations/LOCATION_ID/aspectTypes/ASPECT_TYPE_ID.

HTTP method and URL:

POST https://dataplex.googleapis.com/v1/projects/JOB_PROJECT/locations/LOCATION_ID/metadataJobs?metadataJobId=METADATA_JOB_ID

Request JSON body:

{
  "type": EXPORT,
  "export_spec": {
    "output_path": "gs://BUCKET/",
    "scope": {
      "organizationLevel": true,
      "entry_types": [
        "ENTRY_TYPE",
        # Additional entry types
      ],
      "aspect_types": [
        "ASPECT_TYPE",
        # Additional aspect types
      ]
    },
  }
}

To send your request, expand one of these options:

The response identifies a long-running operation. The exported metadata is saved to a Cloud Storage bucket.

Get details about a metadata job

To get information about a metadata job, such as the status of the job and the number of entries that were exported, use the metadataJobs.get method.

Metadata export results

The metadata export job exports a snapshot of your universal catalog metadata at the time that the metadata job was created.

Export file contents

The contents of the output file follow the same format as the metadata import file that is used for metadata import jobs. You can use the output file directly as the input for a metadata import job.

Export file location

Dataplex saves the export result files to a Cloud Storage bucket as objects.

The object path for each output file is constructed by using the bucket name and custom prefix that you specified in the export job, followed by a system-generated path. The system-generated path is designed for integration with BigQuery. The object path uses the following format:

gs://BUCKET/PREFIX/year=YYYY/month=MM/day=DD/consumer_project=JOB_PROJECT/job=METADATA_JOB_ID/project=METADATA_SOURCE_PROJECT/entry_group=ENTRY_GROUP/FILE_NUMBER.jsonl

Note the following:

  • The system-generated path starts with the standard Hive partition format for the export job's creation date. This format is supported by BigQuery. For more information, see Loading externally partitioned data.
  • The consumer_project parameter is the project where you run the metadata export job. The project parameter is the project that contains the metadata that you're exporting.
  • You can reuse a metadata job ID if the previous job was deleted. However, when you delete a job, it doesn't delete the files that were exported by that job. This means that if you reuse a deleted job ID, you might see duplicate job IDs in the output file paths.
  • Each output file is named with a file number, which is an integer starting from 1.

    If a metadata export job contains a large number of entries, then the job splits the results into multiple files to limit the size of each output file. The maximum number of entries in each output file is 1,000,000.

Example output files

Here are example output files for a metadata export job that included multiple projects:

gs://export-bucket/example-folder/year=2025/month=04/day=13/consumer_project=admin-project/job=example-job/project=metadata-project-1/entrygroup=entry-group-1/1.jsonl
gs://export-bucket/example-folder/year=2025/month=04/day=13/consumer_project=admin-project/job=example-job/project=metadata-project-2/entrygroup=entry-group-1/1.jsonl
gs://export-bucket/example-folder/year=2025/month=04/day=13/consumer_project=admin-project/job=example-job/project=metadata-project-3/entrygroup=entry-group-2/1.jsonl

Here are example output files for a metadata export job that contained a large entry group. The results for the entry group were split into multiple files.

gs://export-bucket/example-folder/year=2025/month=04/day=13/consumer_project=admin-project/job=another-example-job/project=example-metadata-project/entrygroup=big-entry-group/1.jsonl
gs://export-bucket/example-folder/year=2025/month=04/day=13/consumer_project=admin-project/job=another-example-job/project=example-metadata-project/entrygroup=big-entry-group/2.jsonl

Analyze the exported metadata in BigQuery

If you want to analyze the exported metadata in BigQuery, you can create an external table for the exported metadata. Creating an external table lets you query the exported data without additional data loading or transformation. For example, you can count the number of entries by entry group, find entries that have specific aspects, or perform additional analysis in BigQuery.

Do the following:

  • Create an external table for Hive partitioned data. Provide the following information:

    • Select file from Cloud Storage bucket: provide the path to the Cloud Storage folder that contains the exported metadata files. To include all files in the bucket, use the asterisk (*) wildcard. For example, gs://export-bucket/example-folder/*.
    • File format: select JSONL (Newline delimited JSON).
    • Select the Source data partitioning checkbox, and then for Select Source URI Prefix, provide the Cloud Storage URI prefix for the BigQuery table to define partitions. For example, gs://export-bucket/example-folder/.
    • Partition inference mode: select the Automatically infer types option.
    • Table type: select the External table option.
    • Schema: click the Edit as text toggle, and then enter the following schema definition for the export files:

      [
        {
          "name": "entry",
          "type": "RECORD",
          "mode": "NULLABLE",
          "fields": [
            {
              "mode": "NULLABLE",
              "name": "name",
              "type": "STRING"
            },
            {
              "mode": "NULLABLE",
              "name": "entryType",
              "type": "STRING"
            },
            {
              "mode": "NULLABLE",
              "name": "createTime",
              "type": "STRING"
            },
            {
              "mode": "NULLABLE",
              "name": "updateTime",
              "type": "STRING"
            },
            {
              "mode": "NULLABLE",
              "name": "aspects",
              "type": "JSON"
            },
            {
              "mode": "NULLABLE",
              "name": "parentEntry",
              "type": "STRING"
            },
            {
              "mode": "NULLABLE",
              "name": "fullyQualifiedName",
              "type": "STRING"
            },
            {
              "mode": "NULLABLE",
              "name": "entrySource",
              "type": "RECORD",
              "fields": [
                {
                  "mode": "NULLABLE",
                  "name": "resource",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "system",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "platform",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "displayName",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "description",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "labels",
                  "type": "JSON"
                },
                {
                  "mode": "REPEATED",
                  "name": "ancestors",
                  "type": "RECORD",
                  "fields": [
                    {
                      "mode": "NULLABLE",
                      "name": "name",
                      "type": "STRING"
                    },
                    {
                      "mode": "NULLABLE",
                      "name": "type",
                      "type": "STRING"
                    }
                  ]
                },
                {
                  "mode": "NULLABLE",
                  "name": "createTime",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "updateTime",
                  "type": "STRING"
                },
                {
                  "mode": "NULLABLE",
                  "name": "location",
                  "type": "STRING"
                }
              ]
            }
          ]
        }
      ]
      

BigQuery creates an external table that contains the exported metadata. The table's schema includes an entry schema column, where each row represents one entry. For more information about the fields for an entry, see ImportItem. The table's schema also contains the export file partitions, as described in the Export file location section of this document.

After you create the external table, you can query the table using GoogleSQL syntax. For example, to query which entry types were exported, use the following statement:

SELECT entry.entryType FROM `example-project.example-dataset.example-table` LIMIT 1000

What's next