Data Catalog is deprecated and will be discontinued on January 30, 2026. For steps to transition your Data Catalog users, workloads, and content to Dataplex Universal Catalog, see Transition from Data Catalog to Dataplex Universal Catalog.

Integrate your data sources with Data Catalog

Data Catalog can import and keep up-to-date metadata from several Google Cloud data sources as well as a number of popular on-premises ones.

With metadata ingested, Data Catalog does the following:

Makes the existing metadata discoverable through search. For more information, see How to search.
Allows the members of your organization to enrich your data with additional business metadata through tags. For more information, see Tags and tag templates.

While the integration with Google Cloud sources is automatic, to integrate with custom on-premises sources that your organization uses, you can do either of the following:

Set up and run corresponding connectors contributed by the community.
Use the Data Catalog API for custom entries.

Before you begin

If you're already using Data Catalog, you must already have a project with the enabled Data Catalog API. For more information on the recommended way to use multiple projects with Data Catalog, see Using tag templates in multiple projects.

If this is the first time you interact with the Data Catalog, do the following:

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Data Catalog API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Data Catalog API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Integrate Google Cloud data sources

BigQuery sharing

When you subscribe to a listing in BigQuery sharing (formerly Analytics Hub), a linked dataset is created in your project. Data Catalog automatically generates metadata entries for that linked dataset and all tables contained in it. For more information on linked datasets and other BigQuery sharing features, see Introduction to Sharing.

In Data Catalog search, linked datasets are displayed as standard BigQuery datasets, but you can filter them using the type=dataset.linked predicate. For more details, see Search for data assets.

BigQuery and Pub/Sub

If your organization already uses BigQuery and Pub/Sub, depending on your permissions, you can search for the metadata from those sources right away. If you can't see the corresponding entries in search results, look for the IAM roles that you and the users of your project might need in Identity and Access Management.

Bigtable

When you store data in Bigtable, metadata is automatically synced to Data Catalog for the following Bigtable resources:

Instances
Tables, including column family details

For guidance on using Data Catalog for data discovery and tagging, see Manage data assets using Data Catalog in the Bigtable documentation.

Cloud SQL

Cloud SQL doesn't integrate with Data Catalog, but does integrate with Dataplex Universal Catalog. For more information, see Integrate your data sources with Dataplex Universal Catalog.

Dataproc Metastore

To integrate with Dataproc Metastore, enable the sync to Data Catalog for new or existing services as described in Enabling Data Catalog sync.

Sensitive Data Protection

Additionally, Data Catalog integrates with Sensitive Data Protection that lets you scan specific Google Cloud resources for sensitive data and send results back to Data Catalog in the form of tags.

For more information, see Sending Sensitive Data Protection scan results to Data Catalog.

Spanner

When you store data in Spanner, metadata for the following Spanner resources is synced to Data Catalog:

Instances
Databases
Tables and views with column schema

For guidance on using Data Catalog for data discovery and tagging, see Manage data assets using Data Catalog.

Vertex AI

Vertex AI syncs metadata for the following resources to Data Catalog:

Integrate on-premises data sources

To integrate on-premises data sources, you can use the corresponding Python connectors contributed by the community:

Find your data source in the following table.
Open its GitHub repository.
Follow the setup instructions in the readme file.

Category	Component	Description	Repository
RDBMS	mysql-connector	Sample code for MySQL data source.	google-datacatalog-mysql-connector
	postgresql-connector	Sample code for PostgreSQL data source.	google-datacatalog-postgresql-connector
	sqlserver-connector	Sample code for SQLServer data source.	google-datacatalog-sqlserver-connector
	redshift-connector	Sample code for Redshift data source.	google-datacatalog-redshift-connector
	oracle-connector	Sample code for Oracle data source.	google-datacatalog-oracle-connector
	teradata-connector	Sample code for Teradata data source.	google-datacatalog-teradata-connector
	vertica-connector	Sample code for Vertica data source.	google-datacatalog-vertica-connector
	greenplum-connector	Sample code for Greenplum data source.	google-datacatalog-greenplum-connector
	rdbmscsv-connector	Sample code for generic RDBMS CSV ingestion.	google-datacatalog-rdbmscsv-connector
	saphana-connector	Sample code for Sap Hana data source.	google-datacatalog-saphana-connector
BI	looker-connector	Sample code for Looker data source.	google-datacatalog-looker-connector
	qlik-connector	Sample code for Qlik Sense data source.	google-datacatalog-qlik-connector
	tableau-connector	Sample code for Tableau data source.	google-datacatalog-tableau-connector
Hive	hive-connector	Sample code for Hive data source.	google-datacatalog-hive-connector
Hive	apache-atlas-connector	Sample code for Apache Atlas data source.	google-datacatalog-apache-atlas-connector

Integrate unsupported data sources

If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can:

Use one of the Data Catalog Client Libraries in one of the following languages: C#, Go, Java, Node.js, PHP, Python, or Ruby.
Or manually build on the Data Catalog API.

To integrate your sources, first, learn about Entries and entry groups, then follow the instructions in Create custom Data Catalog entries for your data sources.

What's next

Learn more about Identity and Access Management.
Learn How to search.
Go through the Tagging tables quickstart.