Transition from Data Catalog to Dataplex Catalog

If you're using Data Catalog, you can transition your Data Catalog content and usage to Dataplex Catalog by following the steps described in this document.

If you're new to using catalog, you are invited to start with Dataplex Catalog by configuring it as your default catalog experience.

This document is intended for data administrators and data stewards.

Dataplex Catalog has a new web interface and API. Transitioning to Dataplex Catalog lets your existing and future workloads and users to benefit from its enhanced features and capabilities. For more information, see Dataplex Catalog overview.

For a comparison of Dataplex Catalog and Data Catalog, refer to Dataplex Catalog versus Data Catalog.

Before you start the transition, review the features that aren't supported in Dataplex Catalog.

Transition overview

The Manage transition to Dataplex page in the Google Cloud console provides a single point of access for managing transition.

This section describes the various transition scenarios and outlines the transition process for each of the scenarios.

If you don't have custom metadata in Data Catalog or if you have never used Data Catalog

If you're using Data Catalog and don't have custom metadata, such as tags, tag templates, custom entries, or entry groups, or if you have never used Data Catalog, then complete the transition by setting your default catalog UI experience to Dataplex Catalog.

If you have Data Catalog custom metadata or programmatic workloads

If you're using Data Catalog with custom metadata (such as tags, tag templates, custom entries, or entry groups), Dataplex can automatically import this custom metadata into Dataplex Catalog as read-only. To enable this capability, configuration changes are required as outlined in the preparatory phase.

The second phase is the transfer phase, where the active state (read-write) of your custom metadata is moved to Dataplex Catalog. During this phase, you must also update any programmatic workloads (such as API, client libraries, Terraform modules, and Google Cloud CLI commands) to interact with Dataplex Catalog.

The following provides further detail on each phase of the transition process:

  1. Preparatory phase: perform necessary configuration tasks to make the Data Catalog content simultaneously available in Dataplex Catalog as read-only. During this phase, Data Catalog remains the authoritative source for your metadata content.

    The preparatory phase includes the following tasks:

    1. Update the Data Catalog private tag templates to public.
    2. Opt in tag templates and tags for simultaneous availability of metadata in Dataplex Catalog.
    3. Configure Dataplex Catalog IAM permissions for your custom metadata.
  2. Transfer phase: move your catalog usage and the active state (read-write) of your custom metadata to Dataplex Catalog.

    The transfer phase includes the following tasks:

    1. Set the default catalog UI experience to Dataplex Catalog.
    2. Transfer the Data Catalog custom metadata to Dataplex Catalog.
    3. Update programmatic workloads.

    When you complete the transfer phase, Dataplex Catalog becomes the authoritative source for all your metadata and you don't need to use Data Catalog.

Required roles

To get the permissions that you need to perform transition from Data Catalog to Dataplex Catalog, grant the following IAM roles on the resource:

  • Update private tag templates to public: Data Catalog TagTemplate Owner (roles/datacatalog.tagTemplateOwner)

    This predefined role contains the datacatalog.tagTemplates.update permission, which is required to update private tag templates to public.

  • Opt in public tag templates and tags for simultaneous availability of metadata in Dataplex Catalog: DataCatalog Migration Config Admin (roles/datacatalog.migrationConfigAdmin)

    This predefined role contains the datacatalog.migrationConfig.set permission, which is required to opt in tag templates and tags.

  • Set the default catalog UI experience to Dataplex Catalog: DataCatalog Migration Config Admin (roles/datacatalog.migrationConfigAdmin)

    This predefined role contains the datacatalog.migrationConfig.set permission, which is required to set the default catalog UI experience.

  • Transfer Data Catalog public tag templates to Dataplex Catalog: Data Catalog TagTemplate Owner (roles/datacatalog.tagTemplateOwner)

    This predefined role contains the datacatalog.tagTemplates.update permission, which is required to transfer public tag templates.

  • Transfer Data Catalog entry groups to Dataplex Catalog: DataCatalog EntryGroup Owner (roles/datacatalog.entryGroupOwner)

    This predefined role contains the datacatalog.entryGroups.update permission, which is required to transfer entry groups.

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get this permission with custom roles or other predefined roles.

Preparatory phase

This phase involves tasks to make the Data Catalog metadata content simultaneously available in Dataplex Catalog, in the read-only mode.

The following are the key points to remember during the preparatory phase:

  • Data Catalog remains the authoritative source for your custom metadata.
  • The Dataplex Search page points to Data Catalog search by default, which includes only Data Catalog metadata and not Dataplex Catalog metadata.
  • Dataplex Catalog search might not include all Data Catalog public tag templates and tags until you complete the preparatory phase.

Data Catalog custom entry groups and their entries are already made simultaneously available in Dataplex Catalog as read-only. When you opt-in public tag templates and tags in the preparatory phase, their content will be made simultaneously available in Dataplex Catalog as read-only.

The following table shows mapping between Data Catalog and Dataplex Catalog resources when the Data Catalog resources are made simultaneously available in Dataplex Catalog:

Mapping between Data Catalog and Dataplex Catalog
Data Catalog resource Corresponding Dataplex Catalog resource Description
Tag template Aspect type (global)

Tag templates in Data Catalog are regional resources. When you opt in tag templates, their content is made simultaneously available as global aspect types in Dataplex Catalog. This is to preserve the cross-regional references between tags and tag templates. Definitions of global aspect types are replicated to all Google Cloud regions.

Standard concepts such as Schema are represented by system aspect types, which Dataplex provides. For more information, see Categories of aspect types.

Tag Optional aspect When you opt in tag templates, their tags are made simultaneously available as optional aspects in Dataplex Catalog. For more information, see Categories of aspects.
Entry group Entry group Entry groups for system entries are established per-project per-region in Dataplex Catalog. Entry groups created in Data Catalog are made simultaneously available in Dataplex Catalog.
Custom entry Custom entry All custom entries in Data Catalog are made simultaneously available as entries of GenericEntryType in Dataplex Catalog. For more information, see Categories of entry types.
System entry (Google Cloud) System entry required aspect Metadata describing system entries, such as Schema for BigQuery tables, is captured in required aspects of the system-defined aspect types.

Note: Typically, updates to the Data Catalog content are propagated to Dataplex Catalog within 10 minutes. In some cases, it might take up to 24 hours. To ensure data consistency in time-critical workloads, we recommend verifying Dataplex Catalog resources against Data Catalog resources or the original source system before use.

Update Data Catalog private tag templates to public

Dataplex Catalog doesn't support the concept of private aspects (counterparts to private tags). Therefore, private tag templates and tags aren't made simultaneously available in Dataplex Catalog. To propagate private tag templates and tags in Dataplex Catalog, you must update them to public.

Before you proceed, review the structure of private tag templates to ensure that their structure and metadata are consistent with the visibility of public tags.

To update private tag templates and tags to public, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Catalog page.

    Go to Catalog

  2. Click Manage transition to Dataplex.

  3. Click the Private tag templates tab.

  4. Click Manage tag templates.

  5. On the Tag templates page, click the private tag template that you want to update.

  6. On the Template details page, click Edit.

  7. In the Visibility section, select Public.

  8. Click Update.

REST

To update the private tag templates to public, use the tagTemplates.patch method.

Opt in public tag templates and tags for simultaneous availability of metadata in Dataplex Catalog

When you provide opt-in, the content of Data Catalog public tag templates and tags is automatically and continuously propagated to Dataplex Catalog across your Google Cloud organization or project (without an organization). The Data Catalog public tag templates and tags are propagated to Dataplex Catalog as public aspect types and aspects respectively. Definitions of global aspect types are replicated to all Google Cloud regions.

To opt in public tag templates and tags, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Catalog page.

    Go to Catalog

  2. Click Manage transition to Dataplex.

  3. Click the Opt-in tab.

  4. Click Provide opt-in.

  5. In the confirmation dialog, enter the text displayed, and click Confirm.

REST

To opt-in tag templates and tags, use the setConfig method.

For both Google Cloud organizations and for projects without an organization, you can revoke the originally provided opt-in, by clicking Revoke opt-in. When you revoke opt-in, the process of propagation of Data Catalog tag templates and tags to Dataplex Catalog stops. The Dataplex Catalog aspect types and aspects that are created by the opt-in process but aren't transferred to Dataplex Catalog, are deleted. The deletion process might take up to 12 hours.

Configure Dataplex Catalog IAM permissions for your custom metadata

Dataplex Catalog aspect types and custom entry groups that represent tag templates and entry groups from Data Catalog don't inherit the original Data Catalog IAM permissions. You must manually configure Dataplex IAM permissions for those aspect types and custom entry groups.

For example, if you want to create aspects using a propagated aspect type (that represents Data Catalog tag template), you must configure necessary IAM permissions for that aspect type. Similarly, when you ingest custom metadata, to create custom entries within a propagated custom entry group, you must configure necessary IAM permissions for that entry group.

To configure necessary Dataplex IAM permissions for aspect types and custom entry groups, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Catalog page.

    Go to Catalog

  2. Click Manage transition to Dataplex.

  3. Click the Dataplex permissions tab.

  4. To configure Dataplex IAM permissions for aspect types, click Aspect types and tag templates.

    1. For the tag template propagated from Data Catalog, click > Permissions.

    2. Click Add principals.

    3. Add a new principal, and assign the required roles.

    4. Click Save.

  5. To configure Dataplex IAM permissions for entry groups, click Entry groups.

    1. For the entry group propagated from Data Catalog, click > Permissions.

    2. Click Add principals.

    3. Add a new principal, and assign the required roles.

    4. Click Save.

REST

To configure Dataplex IAM permissions for aspect types, use the aspectTypes.setIamPolicy method.

To configure Dataplex IAM permissions for entry groups, use the entryGroups.setIamPolicy method.

Transfer phase

This phase involves tasks to move the active state (read-write) of your Data Catalog custom metadata to Dataplex Catalog.

The following are the key points to remember during the transfer phase:

  • The Dataplex Search page points to Dataplex Catalog search by default.
  • You can discover and read both the Data Catalog and Dataplex Catalog metadata in Dataplex Catalog.
  • To update metadata in the Google Cloud console, you will be redirected to its source. For example, if you're updating Data Catalog tag templates, you will be redirected to the tag templates page in Data Catalog.

Set the default catalog UI experience to Dataplex Catalog

Setting the default catalog UI experience to Dataplex Catalog means the following:

  • When you go to the Dataplex Search page in the Google Cloud console, Dataplex Catalog search is displayed by default. Dataplex Catalog search operates over Dataplex Catalog and finds all your metadata content, including content created in Dataplex Catalog and content brought in from Data Catalog.

    Data Catalog search finds only Data Catalog content, not Dataplex Catalog content. Therefore, we recommend using Dataplex Catalog search.

  • The web interface encourages you to create new resources in Dataplex Catalog. For example, it defaults to Create Aspect Types instead of Create Tag Templates, and Create Entry Group (Dataplex Catalog) instead of Create Entry Group (Data Catalog).

  • Setting the default catalog UI experience to Dataplex Catalog is typically a one-way operation. However, Dataplex lets you revert changes in exceptional cases, for example, when you want to switch the default Catalog UI experience back to Data Catalog. If you revert the changes, you can't access any content created in Dataplex Catalog, in the Data Catalog web interface.

To set the default catalog UI experience to Dataplex Catalog, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Catalog page.

    Go to Catalog

  2. Click Manage transition to Dataplex.

  3. Click the Dataplex Search tab.

  4. Click Set the default Catalog UI experience to Dataplex Catalog.

REST

To set the default catalog UI experience to Dataplex Catalog, use the setConfig method.

Transfer Data Catalog custom metadata to Dataplex Catalog

You can transfer the Data Catalog public tag templates (that you opted in for) and custom entry groups to Dataplex Catalog. Make sure that you transfer each tag template and custom entry group separately.

The following summarizes the transfer process:

  • Effect

    • Transfers the active state for resources from Data Catalog to Dataplex Catalog. After transfer, to modify the resource or access its current state, you must use Dataplex Catalog.
  • Changes to the data content and flow

    • Marks the resource as read-only in Data Catalog.
    • Marks the resource as read-write in Dataplex Catalog.
    • Stops simultaneous availability of this resource.
  • Dependencies

    • Transferring a tag template also transfers all of the tags that use that template.
    • Transferring entry groups also transfers all of the entries within that group.
    • Transferring entries doesn't affect the transfer status of their tags and respective tag templates.

Before transfer, to modify the business context of an entry (for example, overview, contacts), you can use both the Data Catalog and Dataplex APIs. After transfer, you must use only the Dataplex API.

The following describes what operations are allowed for a custom Data Catalog entry that is tagged, depending on the transfer status of the related custom entry group and tag template:

  • When a tag template is active in Data Catalog, you can use that tag template to create tags. After you transfer the tag template to Dataplex Catalog, you can't use it to create tags in Data Catalog.
  • You can create aspects using aspect types that are propagated from Data Catalog tag templates, only after you transfer the tag templates.

To transfer the Data Catalog tag templates and custom entry groups to Dataplex Catalog, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Catalog page.

    Go to Catalog

  2. Click Manage transition to Dataplex.

  3. Click the Transfer to Dataplex tab.

  4. To transfer Data Catalog tag templates to Dataplex Catalog, click View aspect types and tag templates.

    1. Select the tag templates that you want to transfer.

    2. Click Transfer.

    3. In the confirmation dialog, click Confirm.

  5. To transfer Data Catalog custom entry groups to Dataplex Catalog, click View entry groups.

    1. Select the entry groups that you want to transfer.

    2. Click Transfer.

    3. In the confirmation dialog, click Confirm.

REST

To transfer Data Catalog tag templates and custom entry groups to Dataplex Catalog, use the tagTemplates.patch method.

When you complete the transfer phase, Dataplex Catalog becomes the authoritative source for all your metadata, and you don't need to use Data Catalog.

Update programmatic workloads

Dataplex Catalog comes with a new API, client libraries, Terraform modules, and Google Cloud CLI commands. Therefore, you must update any existing workloads with programmatic usage of Data Catalog to Dataplex Catalog. See code samples for Dataplex.

Make sure that you align updating programmatic workloads with transferring the dependent resources, such as tag templates and custom entry groups. This is necessary because transferring moves the active (read-write) state of a resource from Data Catalog to Dataplex Catalog.

The following is an example sequence of updating programmatic workloads to interact with Dataplex Catalog, coordinated with transferring the dependent resources:

  1. If your workloads can tolerate eventual consistency, move read operations of workloads to Dataplex Catalog.

  2. Move workloads performing tagging write operations to Dataplex Catalog coordinated with transferring their corresponding tag templates.

  3. Move workloads performing custom ingestion write operations to Dataplex Catalog coordinated with transferring the respective custom entry groups.

If you need to decouple updating workloads and transferring corresponding resources, consider implementing logic within the workloads to route read-write to Data Catalog or Dataplex Catalog on the resource transfer status. Thus, workloads will operate independently of the related resource's transfer status.

What's next